FeipingNieFEIPINGNIEGMAILCOMYizhenHuangHUANGYIZHENGMAILCOMXiaoqianWangXIAOQIANWANG93MAVSUTAEDUHengHuangHENGUTAEDUComputerScienceandEngineeringDepartmentUniversityofTexasatArlingtonArlingt ID: 402613
Download Pdf The PPT/PDF document "NewPrimalSVMSolverwithLinearComputationa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications FeipingNieFEIPINGNIE@GMAIL.COMYizhenHuangHUANG.YIZHEN@GMAIL.COMXiaoqianWangXIAOQIAN.WANG93@MAVS.UTA.EDUHengHuangHENG@UTA.EDUComputerScienceandEngineeringDepartment,UniversityofTexasatArlington,Arlington,TX,76019AbstractSupportVectorMachines(SVM)isamongthemostpopularclassicationtechniquesinma-chinelearning,hencedesigningfastprimalSVMalgorithmsforlarge-scaledatasetsisahottopicinrecentyears.ThispaperpresentsanewL2-normregularizedprimalSVMsolverusingAug-mentedLagrangeMultipliers,withlinearcom-putationalcostforLp-normlossfunctions.Themostcomputationallyintensivesteps(thatde-terminethealgorithmiccomplexity)ofthepro-posedalgorithmispurelyandsimplymatrix-by-vectormultiplication,whichcanbeeasilyparal-lelizedonamulti-coreserverforparallelcom-puting.Weimplementandintegrateouralgo-rithmintotheinterfacesandframeworkofthewell-knownLibLinearsoftwaretoolbox.Experi-mentsshowthatouralgorithmiswithstableper-formanceandonaveragefasterthanthestate-of-the-artsolverssuchasSVMperf,PegasosandtheLibLinearthatintegratestheTRON,PCDandDCDalgorithms.1.IntroductionBecausemostareasofscience,simulationsandexperi-mentsareoodedwithbigdata,thereisanurgentneedtodeveloplarge-scaledataclassicationtechniques.Asoneofthemostwidelyusedclassicationmethods,thefastalgorithmtosolveSupportVectorMachines(SVM)isdesired.Givenninstance-labelpairs(xi;yi),1in,xi2d;yi2f 1+1g,theL2-normregularizedSVMintheprimalformaimstooptimizethefollowinguncon- Proceedingsofthe31stInternationalConferenceonMachineLearning,Beijing,China,2014.JMLR:W&CPvolume32.Copy-right2014bytheauthor(s).strainedproblem:minw;b1 2wTw+CnXi=1loss wTxi+b;yi(1)wherethesupportvectorw2dandtheintercept(i.e.biasterm)b2arethevariables;loss(u;v)isalossfunc-tionmeasuringthedifferencebetweentwoscalarsu2andv2;C2istheweightadjustingtheimpor-tancebetweentheregularizationtermwTwandthelosstermnPi=1loss wTxi+b;yi.Ifthelossfunctionisselectedtobethehingelossfunctionloss(u;v)=max(1 uv;0),theproblembecomestheL2-normregularizedL1-normlossprimalSVM,a.k.aL1-primalSVMinsomeliterature:minw;b1 2wTw+CnXi=1 1 (wTxi+b)yi+(2)wheretheoperator(u)+def=max(u;0)returnstheinputscalaru2unchangedifuisnon-negative,andzerooth-erwise.Suchnotationisforbetterreadability.Ifthelossfunctionisselectedtobethesquaredhingelossfunctionloss(u;v)=max(1 uv;0)2,theproblembe-comestheL2-normregularizedL2-normlossprimalSVM,a.k.aL2-primalSVM:minw;b1 2wTw+CnXi=1 1 (wTxi+b)yi2+(3)PrimalSVMisattractive,partlyduetothefactthatitas-suresacontinuousdecreaseintheprimalobjectivefunction(Keerthi&DeCoste,2005).DesigningfastprimalSVMsolversforlarge-scaledatasetsisahotandimportanttopicinrecentyears:Themethodin(Mangasarian&Musicant,2001)wasproposed,butitneedcomputetheinverseofamatrixwithsizeof(d+1)(d+1)andisslowerthanlaterproposedsolvers.(Mangasarian,2002)and(Keerthi&De-Coste,2005)proposedmodiedNewtonmethodstotrain LinearComputationalCostSolverforPrimalSVM L2-primalSVM.AsEq.(3)isnot2ndorderdifferentiable,toobtaintheNewtondirection,theyhavetousethegen-eralizedHessianmatrix(i.e.generalized2ndorderderiva-tive),whichisnotefcientenough.Toovercomethislimi-tation,aTrustRegiOnNewtonmethod(TRON)(Linetal.,2008)wasproposedtosolveL2-primalSVMandlogisticregression.ThetoolboxSVMperf(Joachims,2006)usedacuttingplanetechniquetosolveL1-primalSVM.(Smolaetal.,2008)appliedbundlemethods,andviewedSVMperfasaspecialcase.(Zhang,2004)proposedaStochasticGra-dientDescent(SGD)methodforprimalSVMwithanytypeoflossfunctions;Pegasos(Shalev-Shwartzetal.,2007)ex-tendedZhang'sworkanddevelopedanalgorithmwhichal-ternatesbetweenstochasticgradientdescentstepsandpro-jectionstepswithbetterperformancethanSVMperf.An-otherstochasticgradientimplementationsimilartoPega-soswaspublishedin(Bottou,2007).MorerecentlytheL2-primalSVMissolvedbythePCD(PrimalCoordinateDescent)algorithm(Changetal.,2008)withcoordinatede-scentmethods.Allalgorithmsdiscussedaboveareiterative,whichupdatewateachiteration.Basedonthisunderstanding,itisnat-urallytondthat,thereisatradeoffbetweenthecompu-tationalcostspentineachiterationandthenumberofit-erationsneeded(Linetal.,2008).Pegasosrandomlysub-samplesafewinstancesatatime,soitscostperiterationislow,butthenumberofiterationsishigh.Contrastively,NewtonmethodssuchastheTRONmethodspendscon-siderableamountoftimeperiteration,butconvergesatfastrates.TheDCD(DualCoordinateDescent)algorithm(Hsiehetal.,2008)bypassestheoperationtomaintaingra-dientsateachiterationinthePCD(Changetal.,2008),andlowersthealgorithmiccomplexityperiterationfromO(nd)toO(d)inlinearSVMcases,wheredistheaver-agenumberofnonzeroelementsperinstance.However,suchreductionincomplexitydoesnotapplyfornonlinearSVMwithkernels.Moreover,asshowninourexperiments(seex4),theconvergenceoftheDCDmaybeextremelylengthyforsomedatasets.Inlarge-scalescenarios,usu-allyanapproximatesolutionoftheoptimizationproblemisenoughtoproduceagoodmodel(Linetal.,2008;Changetal.,2008;Hsiehetal.,2008).Thus,methodswithalow-costiterationarepreferredastheycanquicklygenerateareasonablemodel.However,ifonespeciesanunsuitablestoppingcondition,suchmethodsmayfallintothesitua-tionoflengthyiterations.Toaddresstheseissues,wepro-poseanewSVMsolverusingAugmentedLagrangeMulti-pliers(ALM)withsimplematrix-by-vectormultiplicationperiteration,linearcomputationalcost,andprovablecon-vergence.Therestofthemanuscriptisorganizedasfollows:Section2presentsourproposedalgorithm;Section3analyzesitsoptimalityandconvergence;experimentalresultsaregiveninSection4;Section5containstheconcludingremarks.2.ProposedAlgorithmForeaseofnotationandbetterextensibility,werstunifyEq.(2)andEq.(3),andgeneralizethemtominimizetheob-jectivefunctionoftheLp-primalSVM:obj(w;b)=1 2wTw+CnXi=1 1 (wTxi+b)yip+(4)wherep2isaconstantandtypically1p2forbe-ingmeaningful.AfundamentaldifcultyinbothL2-andL1-primalSVMisthat,theirlossfunctions(i.e.hingelossandsquaredhingeloss)ispiecewise.Withthisob-servationandthetrick1 (wTxi+b)yi=yiyi (wTxi+b)yi=yi(yi (wTxi+b)),weintroduceauxiliaryvari-ablesei=yi (wTxi+b),1in,andtheminimizationofEq.(4)becomes:minw;b;ei=yi (Txi+b)1 2wTw+CnXi=1(yiei)p+(5)BasedonALM(Gill&Robinson,2012),Eq.(5)isturnedintoanunconstrainedoptimizationproblembystudyingtheLagrangianfunctionofEq.(5)below:L(w;b;e;)=1 2wTw+CnXi=1(yiei)p++T(XTw+1b y+e)(6)whereXdn=[x1,x2...;xn],1n1=[1,1,...1]T,yn1=[y1,y2,...,yn]T,en1=[e1,e2,...,en]T.Thelasttermisthepointwisemultiplicationoftheamountofviolationofthenconstraints(wTxi+b) yi+ei=0withthevector2nconsistingofnLagrangianmultipliers.ALMadds 2\r\rXTw+1b y+e\r\r2(supplementalterm)toEq.(6):asaugmentstoinnity,thistermforcesthenconstraintstobesatised.SotheaugmentedLagrangianfunctionofEq.(5)isdenedas:AL(w;b;e;;)=1 2wTw+CnPi=1(yiei)p++T(XTw+1b y+e)+ 2XTw+1b y+e2(7)ArrangingthelasttwotermsinEq.(7)intoaquadraticformleadsto:AL(w;b;e;;)=1 2wTw+CnPi=1(yiei)p++ 2XTw+1b y+e+ 2(8)NotethatfromEq.(7)toEq.(8),weaddaterm2 2thatre-mainsconstantwhenperformingoptimizationviathevari-ablesw,bandewithinasingleiteration.As!1,thistermisveryclosetozeroandthusnegligibleeventually.Atthek-thiteration,similartotheIterativeThresholding(IT)method(Wrightetal.,2009),theamountofviolation LinearComputationalCostSolverforPrimalSVM ofthenconstraintsisusedtoupdatetheLagrangianmulti-pliervector:(k)=(k 1)+(k)(XTw+1b y+e)(9)Theaugmentedpenaltyparameterismonotonicallynon-decreasingovertheiterativeprocess.Howtodeterminetheaugmentedpenaltyparameterseries(k)foreveryiterationkwillbediscussedinx3.Weomitthebracketedsubscriptkwhenthereisnorisktocauseanyconfusion,andactu-allyweonlyneedasinglevariable(orarray)tostorethesescalars(orvectors)foralgorithmicimplementation.Notethatthesymbolsofsubscriptsandbracketedsubscriptsre-fertoquitedifferentmeaningsinthispaper.Remarks:AmeritoftheALMisthattheoptimalstepsizetoupdateisproventobethechosenpenaltyparameter(k),makingtheparametertuningmucheasierthanthatoftheIT(Gill&Robinson,2012).Nowateachiteration,wecansplittheupdatingofw;b;eintotwoindependentportions:minimizingewithw;bxedandminimizingw;bwithexed.Whenw;barexed,thetermwTwisconstantandEq.(8)canbedecomposedintonindependentsingle-variableminimizationproblemsw.r.t.ei:ei=argmineiFi(ei)=argmineiC(yiei)p++ 2ei (yi wTxi b i )2=argminei\r(yiei)p++1 2(ei ti)2(10)where\r=C ,iisthei-thelementof,ti=yi wTxi b i isaconstant.SolvingEq.(10)iseasyaseiisthemin-imizerforthesingle-variable2-piecepiecewisefunctionFi(ei),sowejustneedtonditsminimawhenei0andei0separatelyandpickthesmallerone.Whenyiei0,(yiei0)p+=0,soweonlyneedtopickthesmallerbetweenFi(0)andFi(ti).Whenyiei0,weneedtosolvetheequa-tion:@Fi @ei=p\r(yiei)p 1+ei ti=0(11)Forarbitrarygivenpand\r,solvingEq.(11)isdifcult.Butfortunatelyinourscenario,itisalwaysthatp1,\r0.So@Fi @eiismonotonicallyincreasingw.r.t.ei,andwecanusethewell-knownbinarysearchmethodtonarrowthepossi-blerangeofeibyhalfviaeachoperation,andobtainan"-accuratesolutioninO(log1=")time.Particularly,wecanwritetheexplicitsolutionstraightforwardlywhenp=1or2.ForL1-primalSVM(p=1),yi=1:ei=ti yi\rwhenyiti\rei=0when0yiti\rei=tiwhenyiti0(12)ForL2-primalSVM(p=2),yi=1:ei=ti=(1+2\r)whenyiti-5.9;â0;ei=tiwhenyiti0(13)Wheneisxed,thetermPni=1(yiei)p+isconstant,andEq.(8)becomesanL2-normregularizedLeastSquareRe-gression(LSR)problem:G(w;b)=minw;b 1wTw+\r\rXTw+1b z\r\r2(14)wherez=y e 1isaconstantvector.Eq.(14)canbeturnedintoastandardLSRproblemasbelow,ifwesetv=wb,A=XT1 1 2I0andd=z0G(w;b)=G(z)=minzkAz dk2(15)whichcanberesolvedbymanystandardlibrariessuchasthedefaultLSQRfunctioninMATLAB.HerebywenishtheillustrationoftheproposedexactSVM-ALMalgorithmandsummarizedetailsinAlgorithm1.Tobecompatiblewiththeexistingmethodssuchas(Fanetal.,2008;Changetal.,2008;Hsiehetal.,2008;Linetal.,2008),thestoppingconditionissettobekrobj(w;b)k,wheretheuser-speciedparameteris0.01bydefault,androbj(w;b)isthegradientsofobj(w;b)w.r.t.w.However,theLSQRusedherecostsO(nd2)wheredistheaveragenumberofnonzeroelementsperinstance,whichisascostlyascomputingmatrixinverse.Thisistooexpen-siveasweneedtoaffordsuchcomputationeveryiteration.Drivenbythisconsiderationandthetradeoffbetweencostperiterationandthenumberofiterationsasdiscussedintheintroduction,weuseanoptimalstep-sizegradientmethodtoupdatewandbateachiteration.ThegradientsofG(w;b)w.r.t.wandbareasbelow:wg=@G @w=X(XTw+1b z)+ 1w;bg=@G @b=nb+1T(XTw z)(16)Findingtheoptimalstep-sizesisasingle-variablequadraticfunctionminimizationproblem:mins 1(w swg)T(w swg)+kXT(w swg)+1(b sbg) zk2(17)whichhastheexplicitsolutions=(XTwg+1bg)T(XTw+1b z)+ 1wTgw (XTwg+1bg)T(XTwg+1bg)+ 1wTgwg=wTgwg+bTgbg (XTwg+1bg)T(XTwg+1bg)+ 1wTgwg(18)Thelastequalityisjusttosimplifythecomputationofs,andcanbeveriedviasubstitutingwgandbgintwode-nominatorswithEq.(16).Wepreferthesimpliedformula,becauseitsavestwomatrix-by-vectormultiplications. LinearComputationalCostSolverforPrimalSVM Algorithm1ExactSVM-ALMforLp-primalSVM Input:p;X;y;(1);(2);:::;(1)Initializew=1;b=0;=0.repeat1.UpdateewithEq.(11)orEq.(12)orEq.(13).2.Updatew;busingtheLSQRwithEq.(15).3.UpdatewithEq.(9).untilkrobj(w;b)k Algorithm2InexactSVM-ALMforLp-primalSVM Input:p;X;y;(1);(2);:::;(1)Initializew=1;b=0;=0.repeat1.UpdateewithEq.(11)orEq.(12)orEq.(13).2.Updatewbyw swg,updatebbyb sbg,wherewg;bg;sarecomputedwithEq.(16)andEq.(18),re-spectively.3.UpdatewithEq.(9).untilkrobj(w;b)k WesummarizetheproposedinexactSVM-ALMalgorithminAlgorithm2.Ateachiteration,Algorithm2onlyneedsthreematrix-by-vectormultiplicationswithcom-plexityO(nd),wheredistheaveragenumberofnonzeroelementsperinstance.TheseveralpointwiseadditionandmultiplicationbetweentwovectorsarewithcomplexityeitherO(d)orO(n),andcanbeneglectedcomparedtoO(nd).Inlarge-scaledataclassications,thehighdimen-sionalfeaturesarealwaysreducedbytheprescreeningpro-cedure,hencedisnotlarge.Ournewalgorithmhaslinearcomputationalcostw.r.t.thenumberofdatainstancesn.3.ConvergenceandOptimalityWerstprovesomelemmas.Lemma1LetHbearealHilbertspaceendowedwithaninnerproduct-5.9;âandacorrespondingnormkk,andv2kuk,where@f(u)isthesubgradientoff(u).Thenkvk=1ifu=0,andkvk1ifu=0,wherekkisthedualnormofkk.Proof:Becausev2kuk,kdk kukv;d u.11;á;8d2H(19)Ifu=0,settingd=02uleadstokuk=v;u-5.9;âkukkvk(20)Thuswehavekvk1.Ontheotherside,wehavekd ukkdk kukv;d u.11;á;8d2H(21)whichleadstov;d u kd uk18d=u(22)Sokvk1.Thenitcanbeconcludedthatkvk=1.Ifu=0,thenEq.(19)isequivalenttov;d-5.9;â18kdk=1(23)Sokvk1bythedenitionofthedualnorm. Lemma2Thesequencef(k)gineitherAlgorithm1orAlgorithm2isbounded.Proof:Fromw(k)=argminw;bAL(w;b;e(k);(k 1);(k)),b(k)=argminw;bAL(w;b;e(k);(k 1);(k)),e(k)=argmineAL(w(k 1);b(k 1);e;(k 1);(k)),wehave:02AL(w(k);b(k);e(k);(k 1);(k))02bAL(w(k);b(k);e(k);(k 1);(k))02eAL(w(k);b(k);e(k);(k 1);(k))(24)whichindicate:02kw(k)k2 (k 1) (k)(XTw+1b y+e)(25)02kC(yTe(k))+kp (k 1) (k)(XTw+1b y+e)Therefore(k)2kw(k)k2;(k)2kC(yTe(k))+kp(26)AccordingtoLemma1,thesequencef(k)ginAlgorithm1isbounded,becauseofthefactthat,thedualnormsofkk2andkkparekk2andkkp p 1(Linetal.,2009),re-spectively.Theboundednessoff(k)ginAlgorithm2canbeprovedinthesameway. Lemma3Thesequencesfw(k)g,fb(k)g,fe(k)ginei-therAlgorithm1orAlgorithm2areallbounded,ifkw(k+1)k2+kC(yTe(k+1))+kp+05(k+1)kXTw(k+1)+1b(k+1) y+e(k+1)k2kw(k)k2+kC(yTe(k))+kp+05(k)kXTw(k)+1b(k) y+e(k)k2foreveryk0and1Pk=1(k+1) 2(k)1.Proof:Askwk2+kC(yTe)+kp+05kXTw+1b y+ek2isnon-increasingasAlgorithm1iterates,itcanbeveriedthat,AL(w(k);b(k);e(k);(k 1);(k))AL(w(k 1);b(k 1);e(k 1);(k 2);(k 1))+05 2(k 1)((k 1)+(k))k(k 1) (k 2)k2(27)Theaboveinequalitycanbederivedviasub-stitutingwithEq.(9)toeliminate(k 1).SofAL(w(k);b(k);e(k);(k 1);(k))gisupperbounded,owingtotheboundednessoff(k)gand1Pk=1(k)+(k+1) 2(k)1Pk=12(k+1) 2(k)1. LinearComputationalCostSolverforPrimalSVM Thus,wehavekw(k)k2+kC(yTe(k))+kp=AL(w(k);b(k);e(k);(k 1);(k)) k(k)k2 2(k)(28)asupperbounded.Thereforefw(k)g,fe(k)ginAlgorithm1arebothbounded,whichleadstotheboundednessoffb(k)g,asXTw+1b y+e=0.Itcanbeveriedthat,exactlythesamepropertiesholdinAlgorithm2. Thenon-increasingrequirementofkwk2+kC(yTe)+kp+05kXTw+1b y+ek2inLemma3alsoimpliesthewaytogeneratethesequencef(k)gbysettingtheupperlimitof(k):(k+1)=(0:5(k)kXTw(k)+1b(k) y+e(k)k2+kw(k)k2 kw(k+1)k2+kC(yTe(k))+kp kC(yTe(k+1))+kp)(0:5kXTw(k+1)+1b(k+1) y+e(k+1)k2)(29)BecauseofEqs.(10,14),wehaveAL(w(k);b(k);e(k);(k 1);(k))AL(w(k 1);b(k 1);e(k);(k 1);(k))AL(w(k 1);b(k 1);e(k 1);(k 1);(k))whichensuresthatf(k)gisnon-decreasing.Owingtoprecisionlimit,cannotincreasetoinn-ityinpracticalimplementationsofbothAlgorithm1andAlgorithm2,otherwisethesignicantdigitsoftheterms1 2wTwandCnPi=1(yiei)p+inAL(w;b;e;;)wouldbesqueezedoutbytheextremelylargeterm 2kXTw+1b y+e+ k2.Morespecically,hasaupperlimitof105asanimplementationdetail.Wefol-lowtheconventionofmostexistingworkbyusingdouble-precisionoating-pointnumbers.Usingsingleprecisione.g.(Bottou,2007)mayreducethecomputationaltimeinsomesituations,butthissettingmaycausenumericalin-accuracy(Changetal.,2008).AnadvantageoftheALMisthatitconvergestotheexactoptimalsolutionbeforeaugmentstoinnity(Gill&Robinson,2012).Incontrast,strictlyspeakingtheIT(Wrightetal.,2009)onlyndsap-proximatesolutions.Nowwehavecometothemainresultsofthissec-tion.Theorem1Thesolutionconsistingofthelimitofthesequencesfw(k)g,fb(k)g,fe(k)ginAlgorithm1withEq.(29)forupdating,say(w(1),b(1),e(1)),isanop-timalsolutiontotheLp-primalSVMproblemandtheconvergencerateisatleastO( 1(k))inthesensethatjkw(k)k2+kC(yTe(k))+kp obj=O( 1(k)),whereobjistheminimalvalueofobjinEq.(4).Proof:AsthevitalnaturalpropertyofanALMalgorithm,thefollowingistrue:AL(w(k);b(k);e(k);(k 1);(k))=minw;b;eAL(w;b;e;(k 1);(k))minw;b;e;XTw+1b y+e=0AL(w;b;e;(k 1);(k))=minw;b;e;XTw+1b y+e=0kwk2+kC(yTe)+kp+k(k 1)k2 2(k)=minw;bkwk2+kC(1 (XTw+1b)y)+kp+k(k 1)k2 2(k)=obj+k(k 1)k2 2(k)(30)Therstequalityandsecondinequalityareobvious;thethirdequalityisbecauseofthefactthat,whenthecon-straintsw.r.ttheauxiliaryvariableseissatised,thelastterminEq.(8)degeneratesto(k 1)2=2(k);thefourthequalityisobtainedjustbysubstitutingtheconstraints,similartotheconversionfromEq.(5)toEq.(4);thefthequalityisaccordingtothedenitioninEq.(4).InAlgorithm1,itcanbeveriedthat,kw(k)k2+kC(yTe(k))+kp=AL(w(k);b(k);e(k);(k 1);(k)) (k)2 2(k)(31)BasedonEq.(30)wehavekw(k)k2+kC(yTe(k))+kpobj+k(k 1)k2 2(k) k(k)k2 2(k)Theprovedboundednessoff(k)ginLemma2leadsto:obj O( 1(k))kw(k)k2+kC(yTe(k))+kpobj+O( 1(k))Notethattherange[obj O( 1(k));obj+O( 1(k))]isderived,asthetermO( 1(k))maybeeitherpositiveorneg-ative.Herebytheconvergencerateisproved.Whenk!1,O( 1(k))isnegligible,sokw(1)k2+kC(yTe(1))+kpobj(32)AccordingtoEq.(9),theconstraintsXTw(k)+1b(k) y+e(k)= 1(k)((k) (k 1))aresatisedwhenk!1:XTw(1)+1b(1) y+e(1)=0(33)Therefore,(w(1),b(1),e(1))isanoptimalsolutiontotheLp-primalSVMproblem. Theorem2Thesolutionconsistingofthelimitofthese-quencesfw(k)g,fb(k)g,fe(k)ginAlgorithm2withEq.(29)forupdating,say(w(1),b(1),e(1)),isanoptimalsolu-tiontotheLp-primalSVMproblem,if1Pk=1(k+1) 2(k)1andlimk!1(k)(e(k+1) e(k))=0.Notethat,unlikeTheorem1fortheexactALMmethod,theabovestatementonlyguaranteesconvergencebutdoes LinearComputationalCostSolverforPrimalSVM notspecifytherateofconvergencefortheinexactALMmethod.Althoughtheexactconvergencerateoftheinex-actALMmethodisdifculttoobtainintheory,extensivenumericalexperimentshaveshownthatforgeometricallyincreasing,itstillconvergesQ-linearly(Gill&Robin-son,2012;Linetal.,2009).Proof:OurproofhereisbasedonTheorem1bycompar-ingthedifferenceoffw(k)g,fb(k)g,fe(k)gandf(k)ginAlgorithm1andAlgorithm2.Fordistinctionpurpose,wedenotefw(k)g,fb(k)g,fe(k)gandf(k)ginAlgorithm1asf^w(k)g,f^b(k)g,f^e(k)gandf^(k)grespectively,inthisproof.AccordingtoXTw(k)+1b(k) y+e(k)= 1(k)((k) (k 1))fromEq.(9)andtheboundednessoff(k)g,wehavelimk!1XTw(k)+1b(k) y+e(k)=0(34)So(w(k);b(k);e(k))approachesafeasiblesolution.Fur-ther,theboundednessoff(k)gandf^(k)gleadsto:ke(k+1) e(k)k=O( 1(k)k^(k+1) (k+1)k)=O( 1(k))Since1Pk=1 1(k)1Pk=1(k)+(k+1) 2(k)1Pk=12(k+1) 2(k)1,e(k)isaCauchysequence,andhasalimite(1).ThenwithEq.(34),w(k)andb(k)alsohavetheircorrespondinglimitsw(1)andb(1).So(w(1);b(1);e(1))isafeasiblesolu-tion.Ontheotherside,wehavetheoptimalitycondition:(k)2kw(k)k2;(k)2k(yTe(k))+kp(35)Thus,bytheconvexityofnorms(for1p2)wehave:kw(k)k2+C(yTe(k))+pobj ^(k)^w(k) w(k)-5.9;â (k)^e(k) e(k)-5.9;â=obj 1(k)(k);(k) (k 1)-5.9;â+ 1(k)(k)^(k) ^(k 1).11;á (k)(e(k) e(k 1))^w(k) w(k).11;á(36)Thesecondandthirdtermsapproachtozeroduetotheboundednessoff(k)gandf^(k)g.Thelasttermtendstovanishduetotheboundednessoffw(k)gandf^w(k)gtogetherwiththeassumptionlimk!1(k)(e(k+1) e(k))=0.Sowhenk!1,Eq.(36)becomeskw(1)k2+kC(yTe(1))+kpobj(37)So(w(1);b(1);e(1))isanoptimalsolutiontotheLp-primalSVMproblem. 4.ExperimentsThispaperfollowstheconceptsofreproducibleresearch.Allresultspresentedinthemanuscriptarereproducibleusingthecodeandpublicdatasetsavailableonlineathttps://sites.google.com/site/svmalm.Allexperimentsareconductedonan8-coreIntelXeonX54603.16GHz(12MCache,1333MHzFSB)Linuxserverwith32Gmemory.Forallexperimentsexceptinx4.3,weusethedefaultvalue=0.01asinLibLinear.Weterminatethealgorithmswhentheobjectives'changesarelessthan10 4.Inourmethod,weempiricallysetthemaximumiterationnumberas100,becauseinallourexperimentsouralgorithmconvergeswithin100iterations.Weuse7popularlyadoptedbenchmarkdatasetsfromvar-ioussourcesforperformanceevaluations:UCIForest(Collobertetal.,2002)(n=581012;d=54),ijcnn1(Chang&Lin,2001)(n=191681;d=22),Webpage(Platt,1999)(n=64700;d=300),UCIConnect-4(Frank&Asuncion,2010)(n=67557;d=126),SensITVehicle(acoustic/seismic)(Duarte&Hu,2004)(bothn=98528;d=50),Shuttle(Hsu&Lin,2002)(n=58000;d=9),UCIPoker(Frank&Asuncion,2010)(n=1025010;d=10),Epsilon(Sonnenburgetal.,2008)(n=500000;d=2000).TheEpsilondatasethasverydensefeaturesandwasusedinmanypre-viouslarge-scaledataclassications.Theve-foldcrossvalidationisconducted(exceptinx4.3whenallsamplesareusedfortraining)asin(Changetal.,2008).Formulti-classclassication,wefollowthedefaultone-versus-the-reststrategyin(Chang&Lin,2011)and(Fanetal.,2008),andsimplyrelyontheexistingmodulesintheLibLinearsoftwaretoolbox.Theaveragetrainingtimeisreported.4.1.HowDoesTrainingTimeVarieswithn?Fig.1showslog-logplotsofhowtheCPU-timeusedfortrainingincreaseswithrespectton,thenumberoftrainingsamples.Becausewhennissmallthetrainingtimeistooshorttobemeasuredaccurately,weruneachtestfor10timesandreportthetotaltrainingtimeinFig.1.Linesinalog-logplotcorrespondtopolynomialgrowthO(nl),wherelcorrespondstotheslopeoftheline.ItisseenfromFig.1that,thetrainingtimeofboththeexactSVM-ALMandtheinexactSVM-ALMisroughlylinearwithrespectton,sincetheslopesofthelinesrepresent-ingvariousdatasetsareverycloseto1.Togetherwiththetheoreticalanalysisinx2thatoneiterationoftheinexactSVM-ALMalgorithmcostsO(nd),Algorithm2isshowntobealinearcomputationalcostsolverfortheLp-primalSVM.Notethatanadvantageofouralgorithmsisthat,thetrain-ingtime(andobviouslythetestingtimeaswell)iscom-pletelyirrelevantwithweightCandnormp. LinearComputationalCostSolverforPrimalSVM 102 103 104 105 10-1 100 101 102 103 Number of Training SamplesTraining Time (Seconds)Solving L1-primal SVM by the Inexact SVM-ALM UCI Forest ijcnn1 Webpage UCI Connect-4 SensIT Vehicle (acoustic) SensIT Vehicle (seismic) Shuttle UCI Poker Epsilon 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L1-primal SVM by the Exact SVM-ALM 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L2-primal SVM by the Inexact SVM-ALM 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L2-primal SVM by the Exact SVM-ALM Figure1.TrainingtimeoftheproposedexactSVM-ALM(Algorithm1)andinexactSVM-ALM(Algorithm2)asafunctionofn.4.2.PredictionAccuracyComparisonbetweenExactandInexactSVM-ALMAlgorithmsAnaturaldrawbackoftheInexactSVM-ALMAlgorithmisthatitstillrequiresaugmentstoinnityforobtain-ingtheexactoptimalsolution,asanalyzedintheproofofTheorem2.ThispropertyissimilartotheITalgorithms(Wrightetal.,2009).However,owingtoprecisionlimitasdiscussedinx2,cannotincreasetoinnityinpracti-calimplementationsoftheInexactSVM-ALMAlgorithm2.SoapotentialconcernisthatthespeedupoftheInexactSVM-ALMovertheExactSVM-ALMcomesattheex-penseofpredictionaccuracies,butthisisnotthecaseinfact,asveriedexperimentallyinthissubsection.Fig.2showsthedifferenceintermsofpredictionaccuracybetweentheclassicationmodelsproducedbytheinexactSVM-ALMandtheexactSVM-ALM.Forbetterreadabil-ity,theaxisofCisplottedinlog-scale,andthedifferenceisshownintermsofpercentagepoints.ApositivevalueindicatesthattheinexactSVM-ALMhashigherpredictionaccuracy,whileanegativevalueindicatesthattheexactSVM-ALMperformsbetter.ForalmostallvaluesofCbothalgorithmsperformalmostidentically.Inparticular,thereisnoindicationthatthemodelslearnedbytheinex-actSVM-ALMarelessaccurate.Contrarily,thepredictionaccuracyoftheinexactSVM-ALMmaybeslightlybetterthanthatoftheexactSVM-ALM,andsuchphenomenaisreasonablebecauseithasbeenreportedthatsomeimple-mentationsofSVMsolversachievehigheraccuracybeforetheobjectivefunctionreachesitsminimal(Changetal.,2008).4.3.TrainingTimeComparisonTheproposedAlgorithm2iscomparedwiththestateoftheartsolversSVMperf,Pegasos,BMRM(BundleMethodforRegularizedRiskMinimization)(Teoetal.,2010)andtheLibLinearthatintegratestheTRON,PCDandDCDalgo-rithms.TheL1-primalSVMcannotbesolvedbythePCD(Changetal.,2008),becauseitsobjectivefunctionEq.(2)isnon-differentiable.ThusthePCDismissingfromthetestfortheL1-primalSVM.Asaconvention(Joachims,2006)(Shalev-Shwartzetal.,2007)(Changetal.,2008)(Hsiehetal.,2008)(Linetal.,2008),SVMperf,PegasosandtheTRONmethodaretypicallyonlytestedfortheL1-primalSVM.BecausetheTRON,PCDandDCDalgorithmsdonotsup-portthebiastermb,weextendeachinstancebyanad-ditionaldimensionwithalargeconstantT=103,asin-structedin(Hsiehetal.,2008;Linetal.,2008).AslongastheconstantTintheadditionaldimensionissufcientlylarge,suchconversionisequivalenttosupportingthetrain-ingofthebiastermb.Withthesamesettingsasin(Changetal.,2008)(Hsiehetal.,2008)wecomparetheL1-SVMandL2-SVMsolversintermofthetrainingtimetoreducetheobjectivefunctionobj()suchthattherelativedifferenceofobjtotheopti-mumobj,(obj obj)=objiswithin0.01.Inordertoobtainthereferencesolutions,werunTRONwiththestoppingconditionrobj(w)001.Sincetheobjectivefunctionsarestableundersuchstrictstoppingconditions,thesesolutionsareseentobeveryclosetotheground-truthoptima.TheresultsarelistedinTables2and3,fromwhichitisseenthat,theproposedalgorithmiswithstableperfor-manceandonaveragefasterthanitscompetitors.Thead-vantageoftheproposealgorithmismoreobviousforlargedatasets,suchastheUCIForest,SensITVehicle,andUCIPokerdatasets.TheDCDalgorithmisnotstable,asitmaygetstuckatsometestcasesbutconvergesextremelyfastatothertestcases.Whenthedimensionalityoffeaturesin-creasesto2000astheEpsilondata,ouralgorithmstillper-formswell,andisthefastestsolverforL1-SVMandthesecondfastestsolverforL2-SVM.4.4.TheOptimalpforLp-PrimalSVMAnaturaladvantageofourproposedalgorithmsisthat,itcansolvetheprimalSVMwithLp-normlossfunctionsforanyp1.Itisnotdifculttounderstandthefactthat,itshouldbecoincidentalforeitherp=1orp=2tomakethepredictionaccuracyoftheLp-primalSVMthehighestamongallpossiblepvalues. LinearComputationalCostSolverforPrimalSVM 100 101 102 103 104 105 106 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 CPrediction accuracy difference(%)L1-primal SVM UCI Forest ijcnn1 Webpage UCI Connect-4 SensIT Vehicle (acoustic) SensIT Vehicle (seismic) Shuttle UCI Poker Epsilon 100 101 102 103 104 105 106 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 CPrediction accuracy difference(%)L2-primal SVM Figure2.PredictionaccuracydifferencebetweentheinexactSVM-ALM(Algorithm2)andtheexactSVM-ALM(Algorithm1)forL1-primalandL2-primalSVMsasafunctionofC.Table1.Thetrainingtime(seconds)foranL1-SVMsolvertore-duceobj()towithin1%oftheoptimalvalue.Thoughthetrain-ingtimeoftheproposedalgorithmsisirrelevantwithC,thetrain-ingtimeofSVMperf,TRON,PCDandDCDmaybeaffectedbyC.Following(Changetal.,2008)and(Hsiehetal.,2008),wesetC=1forfaircomparison.Thetrainingtimeismeasuredandaveragedover10runs.Thesolverwiththeshortestrunningtimeisboldfaced. DATASETOURPEGASOSSVMperfDCDBMRM FOREST4.174.1139.250051.8IJCNN13.287.9105.67.863.5WEBPAGE4.638.362.13.630.2CONNECT-42.654.2122.650042.9SENSIT(A)3.9128.7399.817.0102.5SENSIT(S)3.9109.3335.911.185.2SHUTTLE1.229.666.62.220.6POKER4.9107.4303.150080.6EPSILON31.1396.450093.2315.2 Thusweconductaninterestingexperimentshowingthisphenomenon.BecauseexistingSVMsolverscannotsolvetheLp-primalSVMforp=1or2,webelievethatwearethersttoreportsuchresultsinTable3.5.ConclusionThispaperproposedanovellinearcomputationalcostpri-malSVMsolverusingtheALMalgorithmforboththeL1-normandtheL2-normlossfunctions.Toavoidthedif-cultyofdealingwithpiecewiselossfunctions,anauxiliaryvectorisintroducedsuchthatineachiteration,theauxiliaryvectorandthesupportvectorarealternativelyoptimizedwiththedirectionofLagrangemultipliers.Inextensiveexperiments,ourapproachisconsistentlyfasterthanotherstate-of-the-artsolvers.Fromthemethodologicalperspec-tive,theproposedalgorithmisnovelandtotallydifferentfromexistingliteratures.Table2.Thetrainingtime(seconds)foranL2-SVMsolvertore-duceobj()towithin1%oftheoptimalvaluewhenC=1,thesameasinTable1.Thetrainingtimeismeasuredandaveragedover10runs.Thesolverwiththeshortestrunningtimeisbold-faced. DATASETOURTRONPCDDCDBMRM FOREST3.992.310.050050.6IJCNN13.27.73.47.564.2WEBPAGE4.42.20.93.932.1CONNECT-42.710.43.950039.7SENSIT(A)3.927.75.317.599.8SENSIT(S)3.728.14.910.986.1SHUTTLE1.23.60.92.421.1POKER5.159.77.150079.8EPSILON32.6241.916.983.2329.2 Table3.PredictionaccuracyofL1-SVM,L2-SVMandLp-SVM,wherepistunedbytryingtheparametersetf1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2g. DATASETL1-SVML2-SVMLp-SVMp FOREST68.1%65.3%71.0%1.3IJCNN167.3%74.2%74.6%1.9WEBPAGE57.3%59.7%63.4%1.6CONNECT-449.3%44.9%51.8%1.2SENSIT(A)43.5%45.9%47.3%1.8SENSIT(S)41.6%42.4%46.8%1.6SHUTTLE35.9%29.7%36.1%1.1POKER31.5%33.8%36.9%1.7EPSILON42.9%40.3%44.6%1.4 AcknowledgmentsCorrespondingAuthor:HengHuang(heng@uta.edu)ThisworkwaspartiallysupportedbyUSNSFIIS-1117965,IIS-1302675,IIS-1344152.ReferencesBottou,Leon.Stochasticgradientdescentexamples,2007.http://leon.bottou.org/projects/sgd.Chang,Chih-ChungandLin,Chih-Jen.Ijcnn2001chal- LinearComputationalCostSolverforPrimalSVM lenge:Generalizationabilityandtextdecoding.InIJCNN,pp.10311036,2001.Chang,Chih-ChungandLin,Chih-Jen.LIBSVM:Alibraryforsupportvectormachines.ACMTIST,2(3):27:127:27,2011.Softwareavailableathttp://www.csie.ntu.edu.tw/cjlin/libsvm.Chang,Kai-Wei,Hsieh,Cho-Jui,andLin,Chih-Jen.Coor-dinatedescentmethodforlarge-scaleL2-losslinearsvm.JMLR,9:13691398,2008.Collobert,R.,Bengio,S.,andBengio,Y.Aparallelmixtureofsvmsforverylargescaleproblems.NeuralComputa-tion,14(5):11051114,2002.Duarte,M.andHu,Y.H.Vehicleclassicationindis-tributedsensornetworks.JPDC,64(7):826838,2004.Fan,Rong-En,Chang,Kai-Wei,Hsieh,Cho-Jui,Wang,Xiang-Rui,andLin,Chih-Jen.Liblinear:Alibraryforlargelinearclassication.JMLR,9:18711874,2008.Frank,A.andAsuncion,A.Ucimachinelearningreposi-tory,2010.http://archive.ics.uci.edu/ml.Gill,PhilipE.andRobinson,DanielP.Aprimal-dualaugmentedlagrangian.ComputationalOptimizationandApplications,51(1):125,2012.Hsieh,Cho-Jui,Chang,Kai-Wei,Keerthi,S.Sathiya,Sun-dararajan,S.,andLin,Chih-Jen.Adualcoordinatede-scentmethodforlarge-scalelinearsvm.InICML,pp.408415,2008.Hsu,Chih-WeiandLin,Chih-Jen.Acomparisonofmeth-odsformulti-classsupportvectormachines.IEEETNN,13(2):415425,2002.Joachims,Thorsten.Traininglinearsvmsinlineartime.InKDD,pp.217226,2006.Keerthi,S.SathiyaandDeCoste,Dennis.Amodied-nitenewtonmethodforfastsolutionoflargescalelinearsvms.JMLR,6:341361,2005.Lin,Chih-Jen,Weng,RubyC.,andKeerthi,S.Sathiya.Trustregionnewtonmethodforlarge-scalelogisticre-gression.JMLR,9:627650,2008.Lin,Zhouchen,Chen,Minming,Wu,Leqin,andMa,Yi.Theaugmentedlagrangemultipliermethodforexactre-coveryofcorruptedlow-rankmatrices.2009.Mangasarian,O.L.andMusicant,DavidR.Lagrangiansupportvectormachines.JournalofMachineLearningResearch,1:161177,2001.Mangasarian,OlviL.Anitenewtonmethodforclassi-cation.OptimizationMethodsandSoftware,17(5):913929,2002.Platt,JohnC.Fasttrainingofsupportvectormachinesusingsequentialminimaloptimization.InAdvancesinKernelMethods:SupportVectorLearning,pp.185208,1999.Shalev-Shwartz,Shai,Singer,Yoram,andSrebro,Nathan.Pegasos:primalestimatedsubgradientsolverforsvm.InICML,pp.807C814,2007.Smola,AlexJ.,Vishwanathan,S.V.N.,andLe,Quoc.Bundlemethodsformachinelearning.InNIPS,pp.13771384,2008.Sonnenburg,Soeren,Franc,Vojtech,Yom-Tov,Elad,andSebag,Michele.Pascallargescalelearningchallenge,2008.http://largescale.ml.tu-berlin.de.Teo,ChoonHui,Vishwanathan,S.V.N.,Smola,Alex,andLe,QuocV.Bundlemethodsforregularizedriskmini-mization.JMLR,11:311365,2010.Wright,John,Ganesh,Arvind,Rao,Shankar,andMa,Yi.Robustprincipalcomponentanalysis:exactrecoveryofcorruptedlow-rankmatricesbyconvexoptimization.InNIPS,pp.20802088,2009.Zhang,Tong.Solvinglargescalelinearpredictionprob-lemsusingstochasticgradientdescentalgorithms.InICML,pp.919926,2004.