/
IncrementalandDecrementalSupportVectorMachineLearningGertCauwenberghs IncrementalandDecrementalSupportVectorMachineLearningGertCauwenberghs

IncrementalandDecrementalSupportVectorMachineLearningGertCauwenberghs - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
362 views
Uploaded On 2016-04-27

IncrementalandDecrementalSupportVectorMachineLearningGertCauwenberghs - PPT Presentation

CWaiCWWaiCai0gi0gi0gi0xixixisupport vectorerror vectorFigure1SoftmarginclassicationSVMtrainingcoefcientsareobtainedbyminimizingaconvexquadraticobjectivefunctionunderconstra ID: 295688

C#W$a%iC#W$W$ai=C&ai=&0gi=&0gi'0gi(0x)ix)ix)isupport vector*error vector+Figure1:Soft-marginclassicationSVMtraining.coefcients areobtainedbyminimizingaconvexquadraticobjectivefunctionunderconstra

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "IncrementalandDecrementalSupportVectorMa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

IncrementalandDecrementalSupportVectorMachineLearningGertCauwenberghsCLSP,ECEDept.JohnsHopkinsUniversityBaltimore,MD21218gert@jhu.eduTomasoPoggioCBCL,BCSDept.MassachusettsInstituteofTechnologyCambridge,MA02142tp@ai.mit.eduAbstractAnon-linerecursivealgorithmfortrainingsupportvectormachines,onevectoratatime,ispresented.AdiabaticincrementsretaintheKuhn-Tuckerconditionsonallpreviouslyseentrainingdata,inanumberofstepseachcomputedanalytically.Theincrementalprocedureisre-versible,anddecremental“unlearning”offersanefcientmethodtoex-actlyevaluateleave-one-outgeneralizationperformance.Interpretationofdecrementalunlearninginfeaturespaceshedslightontherelationshipbetweengeneralizationandgeometryofthedata.1IntroductionTrainingasupportvectormachine(SVM)requiressolvingaquadraticprogramming(QP)probleminanumberofcoefcientsequaltothenumberoftrainingexamples.Forverylargedatasets,standardnumerictechniquesforQPbecomeinfeasible.Practicaltechniquesdecomposetheproblemintomanageablesubproblemsoverpartofthedata[7,5]or,inthelimit,performiterativepairwise[8]orcomponent-wise[3]optimization.Adisadvantageofthesetechniquesisthattheymaygiveanapproximatesolution,andmayrequiremanypassesthroughthedatasettoreachareasonablelevelofconvergence.Anon-linealterna-tive,thatformulatesthe(exact)solutionfortrainingdataintermsofthatfordataandonenewdatapoint,ispresentedhere.Theincrementalprocedureisreversible,anddecre-mental“unlearning”ofeachtrainingsampleproducesanexactleave-one-outestimateofgeneralizationperformanceonthetrainingset.2IncrementalSVMLearningTraininganSVM“incrementally”onnewdatabydiscardingallpreviousdataexcepttheirsupportvectors,givesonlyapproximateresults[11].Inwhatfollowsweconsiderincre-mentallearningasanexacton-linemethodtoconstructthesolutionrecursively,onepointatatime.ThekeyistoretaintheKuhn-Tucker(KT)conditionsonallpreviouslyseendata,while“adiabatically”addinganewdatapointtothesolution.2.1Kuhn-TuckerconditionsInSVMclassication,theoptimalseparatingfunctionreducestoalinearcombinationofkernelsonthetrainingdata,\n \r \n  ,withtrainingvectors andcorrespondinglabels\r! .Inthedualformulationofthetrainingproblem,the"OnsabbaticalleaveatCBCLinMITwhilethisworkwasperformed. C#W$a%iC#W$W$ai=C&ai=&0gi=&0gi�'0gi(0x)ix)ix)isupport vector*error vector+Figure1:Soft-marginclassicationSVMtraining.coefcients,areobtainedbyminimizingaconvexquadraticobjectivefunctionunderconstraints[12]-.0/1243657298;:\r�=?A@,7BCED�,F�GH(1)withLagrangemultiplier(andoffset),andwithsymmetricpositivedenitekernelmatrixBC\rA   .Therst-orderconditionsonreducetotheKuhn-Tucker(KT)conditions:IH\rKJJ,\r�BCD\r7\n 7 DLNMPORQ\rO\rORQOS,STUORQ\rT(2)JJ\r�\rO(3)whichpartitionthetrainingdataVandcorrespondingcoefcientsWXY,Z[\r\\\,inthreecategoriesasillustratedinFigure1[9]:theset]ofmarginsupportvectorsstrictlyonthemargin(GA ^ _\r),theset`oferrorsupportvectorsexceedingthemargin(notnecessarilymisclassied),andtheremainingsetaof(ignored)vectorswithinthemargin.2.2AdiabaticincrementsThemarginvectorcoefcientschangevalueduringeachincrementalsteptokeepallel-ementsinVinequilibrium,i.e.,keeptheirKTconditionssatised.Inparticular,theKTconditionsareexpresseddifferentiallyas:bIc\rBCedbd�gf6hBibbjZlkmVonpWXqY(4)O\rdbd�gf6hb(5)wheredisthecoefcientbeingincremented,initiallyzero,ofa“candidate”vectoroutsideV.SinceIsrOforthemarginvectorworkingset]t\ruWvxw\\\vy{zY,thechangesincoefcientsmustsatisfy|~}€‚bbƒ…„...bƒ7†z‡‰ˆˆŠ\rD‚dBiƒ…„{d...Biƒ†zd‡‰ˆˆŠbd(6)withsymmetricbutnotpositive-deniteJacobian|:|\r‚Oƒ„}}}ƒ7†zƒ…„Biƒ…„…ƒ…„}}}B?ƒ…„…ƒ†z............ƒ7†zBƒ7†zƒ„}}}Bƒ7†zƒ7†z‡‰ˆˆŠ\(7) Thus,inequilibriumb\r‹bd(8)b\r‹bdjŒkmV(9)withcoefcientsensitivitiesgivenby‚‹‹4ƒ…„...‹ƒ†z‡ˆˆŠ\rDŽ}‚dBiƒ…„…d...Bƒ†zd‡ˆˆŠ(10)where\r|Cw,and‹rOforallŒoutside].Substitutedin(4),themarginschangeaccordingto:bIH\r’‘“b,djZlkVon”WXqY(11)withmarginsensitivities‘•,\r–BC—d�gf6hBC‹G‹jZ™˜kš](12)and‘“ rOforallZin].2.3Bookkeeping:upperlimitonincrementbdIthasbeentacitlyassumedabovethatb,dissmallenoughsothatnoelementofVmovesacross],`and/oraintheprocess.SincetheandIchangewithdthrough(9)and(11),somebookkeepingisrequiredtocheckeachofthefollowingconditions,anddeterminethelargestpossibleincrementbdaccordingly:1.›œ_Ÿž,withequalitywhen joins¡;2.¢œ¤£,withequalitywhen joins¥;3.ž¦”¢§™¤£,¨ª©C«s¡,withequalityžwhen©transfersfrom¡to¬,andequality£when©transfersfrom¡to¥;4.›X­Ÿž,¨“®«s¥,withequalitywhen®transfersfrom¥to¡;5.›X­¯Ÿž,¨“®«s¬,withequalitywhen®transfersfrom¬to¡.2.4Recursivemagic:updatesToaddcandidateqtotheworkingmarginvectorset],isexpandedas:±°‚O...OO}}}OO‡ˆˆŠ‘“d‚‹‹ƒ„...‹4ƒ†z‡ˆˆˆŠ}6²‹‹ƒ„}}}‹ƒ7†z³(13)Thesameformulaappliestoaddanyvector(notnecessarilythecandidate)qto],withparameters‹,‹and‘dcalculatedas(10)and(12).Theexpansionof,asincrementallearningitself,isreversible.Toremoveamarginvector´from],iscontractedas:™°NEDŸ¶µgµwµ¶µ·jZ^Œkp]n”WOYQZ^Œ¶¸\r´(14)whereindexOreferstothe-term.Theupdaterules(13)and(14)aresimilartoon-linerecursiveestimationofthecovari-anceof(sparsied)Gaussianprocesses[2]. C¹aºcg»cWl-W¼l+1acl+1aºcgcWl-W¼l+1acl+1=Csupport vector½error vector¾Figure2:Incrementallearning.Anewvector,initiallyford\rOclassiedwithnegativemarginIdSO,becomesanewmarginorerrorvector.2.5IncrementalprocedureLet€¿NÀ¶,byaddingpointq(candidatemarginorerrorvector)toV:VyÂÁHw\r~Vyn™WXqY.ThenthenewsolutionWXy7Á,wy7ÁHwY,Z\r\\\ŽisexpressedintermsofthepresentsolutionWXyyY,thepresentJacobianinverse,andthecandidate d,d,as:Algorithm1(IncrementalLearning,ÃlÄKà Å;Æ)1.Initialize¢œtozero;2.If›œ_ǟž,terminate( isnotamarginorerrorvector);3.If›œC–ž,applythelargestpossibleincrement¢4œsothat(therst)oneofthefollowingconditionsoccurs:(a)›œÉȞ:Add tomarginset¡,updateÊaccordingly,andterminate;(b)¢4œÈ£:Add toerrorset¥,andterminate;(c)Elementsof˦Ìmigrateacross¡,¥,and¬(“bookkeeping,”section2.3):Updatemembershipofelementsand,if¡changes,updateÊaccordingly.andrepeatasnecessary.TheincrementalprocedureisillustratedinFigure2.Oldvectors,frompreviouslyseentrainingdata,maychangestatusalongtheway,buttheprocessofaddingthetrainingdataqtothesolutionconvergesinanitenumberofsteps.2.6PracticalconsiderationsThetrajectoryofanexampleincrementaltrainingsessionisshowninFigure3.Thealgo-rithmyieldsresultsidenticaltothoseatconvergenceusingotherQPapproaches[7],withcomparablespeedsonvariousdatasetsranginguptoseveralthousandstrainingpoints1.Apracticalon-linevariantforlargerdatasetsisobtainedbykeepingtrackonlyofalimitedsetof“reserve”vectors:aÍ\rÍWXZ?k’VÏÎO;SISÑÐY,anddiscardingalldataforwhichIMÒÐ.ForsmallÐ,thisimpliesasmalloverheadinmemoryover]and`.ThelargerÐ,thesmallertheprobabilityofmissingafuturemarginorerrorvectorinpreviousdata.TheresultingstoragerequirementsaredominatedbythatfortheinverseJacobian,whichscaleash …Ówherehisthenumberofmarginsupportvectors,ÔÕ].3Decremental“Unlearning”Leave-one-out(LOO)isastandardprocedureinpredictingthegeneralizationpowerofatrainedclassier,bothfromatheoreticalandempiricalperspective[12].Itisnaturallyimplementedbydecrementalunlearning,adiabaticreversalofincrementallearning,oneachofthetrainingdatafromthefulltrainedsolution.Similar(butdifferent)bookkeepingofelementsmigratingacross],`andaappliesasintheincrementalcase.1Matlabcodeanddataareavailableathttp://bach.ece.jhu.edu/pub/gert/svm/incremental. Ö×ØÙ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100ÚÛÚÜÝÜÛÚÛÚÜÝÜÛÞßàáâßãäåæçèééêëêèìíîaïðñòñóñôñõññõñðñöñòñ÷ñóñøñôñùñõññFigure3:Trajectoryofcoefcients,asafunctionofiterationstepduringtraining,for\rO6Onon-separablepointsintwodimensions,withT\rO,andusingaGaussiankernelwithúš\r.Thedatasequenceisshownontheleft.Cûaücgcaücgcac=ýC-1-1aücgc\cgc\cFigure4:Leave-one-out(LOO)decrementalunlearning(,d¿O)forestimatinggeneral-izationperformance,directlyonthetrainingdata.IdÿþdSDrevealsaLOOclassicationerror.3.1Leave-one-outprocedureLet¿ÏD,byremovingpointq(marginorerrorvector)fromV:Vþd\r~VEWqY.ThesolutionWXGþdþdYisexpressedintermsofWXY,andtheremovedpoint d,d.ThesolutionyieldsIdÿþd,whichdetermineswhetherleavingqoutofthetrainingsetgeneratesaclassicationerror(Id·þdSD).Startingfromthefull-pointsolution:Algorithm2(DecrementalUnlearning,ÃÉıÔÆ,andLOOClassication)1.If isnotamarginorerrorvector:Terminate,“correct”( isalreadyleftout,andcorrectlyclassied);2.If isamarginorerrorvectorwith›œ€Æ:Terminate,“incorrect”(bydefaultasatrainingerror);3.If isamarginorerrorvectorwith›œ€¯€Æ,applythelargestpossibledecrement¢4œsothat(therst)oneofthefollowingconditionsoccurs:(a)›œ€Æ:Terminate,“incorrect”;(b)¢œȞ:Terminate,“correct”;(c)Elementsof˦Ìmigrateacross¡,¥,and¬:Updatemembershipofelementsand,if¡changes,updateÊaccordingly.andrepeatasnecessary.Theleave-one-outprocedureisillustratedinFigure4.  \n \r \r \r\n\r \raFigure5:TrajectoryofLOOmarginIdasafunctionofleave-one-outcoefcientd.ThedataandparametersareasinFigure3.3.2Leave-one-outconsiderationsIfanexactLOOestimateisrequested,twopassesthroughthedataarerequired.TheLOOpasshassimilarrun-timecomplexityandmemoryrequirementsastheincrementallearningprocedure.ThisissignicantlybetterthantheconventionalapproachtoempiricalLOOevaluationwhichrequires(partialbutpossiblystillextensive)trainingsessions.ThereisaclearcorrespondencebetweengeneralizationperformanceandtheLOOmarginsensitivity‘“d.AsshowninFigure4,thevalueoftheLOOmarginIdþdisobtainedfromthesequenceofIdvs.dsegmentsforeachofthedecrementsteps,andthusdeterminedbytheirslopes‘d.Incidentally,theLOOapproximationusinglinearresponsetheoryin[6]correspondstotherstsegmentoftheLOOprocedure,effectivelyextrapolatingthevalueofIdþdfromtheinitialvalueof‘d.ThissimpleLOOapproximationgivessatisfactoryresultsinmost(thoughnotall)casesasillustratedintheexampleLOOsessionofFigure5.Recentworkinstatisticallearningtheoryhassoughtimprovedgeneralizationperformancebyconsideringnon-uniformityofdistributionsinfeaturespace[13]ornon-uniformityinthekernelmatrixeigenspectrum[10].Ageometricalinterpretationofdecrementalunlearn-ing,presentednext,shedsfurtherlightonthedependenceofgeneralizationperformance,through‘“d,onthegeometryofthedata.4GeometricInterpretationinFeatureSpaceThedifferentialKuhn-Tuckerconditions(4)and(5)translatedirectlyintermsofthesensi-tivities‘and‹as‘\rB—d�gf6hB‹‹jZlkVonpWqY(15)O\rd�gf6h‹\(16)Throughthenonlinearmapr\n  intofeaturespace,thekernelmatrixelementsreducetolinearinnerproducts:B\r\n   _\r}jZ7Œ(17)andtheKTsensitivityconditions(15)and(16)infeaturespacebecome‘•\r}md�gf6h‹ \n‹jZlkšVnpWqY(18) O\rd�gf6h‹\(19)Since‘“HrO,jZÉkp],(18)and(19)areequivalenttominimizingafunctional:-.e/:d\r=d�gf6h‹ Ó(20)subjecttotheequalityconstraint(19)withLagrangeparameter‹.Furthermore,theoptimalvalueofdimmediatelyyieldsthesensitivity‘d,from(18):‘d\r=d\rd�gf6h‹ ÓMO\(21)Inotherwords,thedistanceinfeaturespacebetweensampleqanditsprojectionon]along(16)determines,through(21),theextenttowhichleavingoutqaffectstheclassi-cationofq.Notethatonlymarginsupportvectorsarerelevantin(21),andnottheerrorvectorswhichotherwisecontributetothedecisionboundary.5ConcludingRemarksIncrementallearningand,inparticular,decrementalunlearningofferasimpleandcompu-tationallyefcientschemeforon-lineSVMtrainingandexactleave-one-outevaluationofthegeneralizationperformanceonthetrainingdata.Theprocedurescanbedirectlyex-tendedtoabroaderclassofkernellearningmachineswithconvexquadraticcostfunctionalunderlinearconstraints,includingSVregression.Thealgorithmisintrinsicallyon-lineandextendstoquery-basedlearningmethods[1].Geometricinterpretationofdecrementalunlearninginfeaturespaceelucidatesaconnection,similarto[13],betweengeneralizationperformanceandthedistanceofthedatafromthesubspacespannedbythemarginvectors.References[1]C.Campbell,N.CristianiniandA.Smola,“QueryLearningwithLargeMarginClassiers,”inProc.17thInt.Conf.MachineLearning(ICML2000),MorganKaufman,2000.[2]L.CsatoandM.Opper,“SparseRepresentationforGaussianProcessModels,”inAdv.NeuralInformationProcessingSystems(NIPS'2000),vol.13,2001.[3]T.-T.Frieß,N.CristianiniandC.Campbell,“TheKernelAdatronAlgorithm:AFastandSim-pleLearningProcedureforSupportVectorMachines,”in15thInt.Conf.MachineLearning,MorganKaufman,1998.[4]T.S.JaakkolaandD.Haussler,“ProbabilisticKernelMethods,”Proc.7thInt.WorkshoponArticialIntelligenceandStatistics,1998.[5]T.Joachims,“MakingLarge-ScaleSupportVectorMachineLearningPractical,”inSch¨olkopf,BurgesandSmola,Eds.,AdvancesinKernelMethods–SupportVectorLearning,CambridgeMA:MITPress,1998,pp169-184.[6]M.OpperandO.Winther,“GaussianProcessesandSVM:MeanFieldResultsandLeave-One-Out,”Adv.LargeMarginClassiers,A.Smola,P.Bartlett,B.Sch¨olkopfandD.Schuurmans,Eds.,CambridgeMA:MITPress,2000,pp43-56.[7]E.Osuna,R.FreundandF.Girosi,“AnImprovedTrainingAlgorithmforSupportVectorMa-chines,”Proc.1997IEEEWorkshoponNeuralNetworksforSignalProcessing,pp276-285,1997.[8]J.C.Platt,“FastTrainingofSupportVectorMachinesUsingSequentialMinimumOptimiza-tion,”inSch¨olkopf,BurgesandSmola,Eds.,AdvancesinKernelMethods–SupportVectorLearning,CambridgeMA:MITPress,1998,pp185-208.[9]M.PontilandA.Verri,“PropertiesofSupportVectorMachines,”itNeuralComputation,vol.10,pp955-974,1997.[10]B.Sch¨olkopf,J.Shawe-Taylor,A.J.SmolaandR.C.Williamson,“GeneralizationBoundsviaEigenvaluesoftheGramMatrix,”NeuroCOLT,TechnicalReport99-035,1999.[11]N.A.Syed,H.LiuandK.K.Sung,“IncrementalLearningwithSupportVectorMachines,”inProc.Int.JointConf.onArticialIntelligence(IJCAI-99),1999.[12]V.Vapnik,TheNatureofStatisticalLearningTheory,'NewYork:Springer-Verlag,1995.[13]V.VapnikandO.Chapelle,“BoundsonErrorExpectationforSVM,”inSmola,Bartlett,Sch¨olkopfandSchuurmans,Eds.,AdvancesinLargeMarginClassiers,CambridgeMA:MITPress,2000.

Related Contents


Next Show more