/
NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications

NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications - PDF document

debby-jeon
debby-jeon . @debby-jeon
Follow
380 views
Uploaded On 2016-07-13

NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications - PPT Presentation

FeipingNieFEIPINGNIEGMAILCOMYizhenHuangHUANGYIZHENGMAILCOMXiaoqianWangXIAOQIANWANG93MAVSUTAEDUHengHuangHENGUTAEDUComputerScienceandEngineeringDepartmentUniversityofTexasatArlingtonArlingt ID: 402613

FeipingNieFEIPINGNIE@GMAIL.COMYizhenHuangHUANG.YIZHEN@GMAIL.COMXiaoqianWangXIAOQIAN.WANG93@MAVS.UTA.EDUHengHuangHENG@UTA.EDUComputerScienceandEngineeringDepartment UniversityofTexasatArlington Arlingt

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "NewPrimalSVMSolverwithLinearComputationa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

NewPrimalSVMSolverwithLinearComputationalCostforBigDataClassications FeipingNieFEIPINGNIE@GMAIL.COMYizhenHuangHUANG.YIZHEN@GMAIL.COMXiaoqianWangXIAOQIAN.WANG93@MAVS.UTA.EDUHengHuangHENG@UTA.EDUComputerScienceandEngineeringDepartment,UniversityofTexasatArlington,Arlington,TX,76019AbstractSupportVectorMachines(SVM)isamongthemostpopularclassicationtechniquesinma-chinelearning,hencedesigningfastprimalSVMalgorithmsforlarge-scaledatasetsisahottopicinrecentyears.ThispaperpresentsanewL2-normregularizedprimalSVMsolverusingAug-mentedLagrangeMultipliers,withlinearcom-putationalcostforLp-normlossfunctions.Themostcomputationallyintensivesteps(thatde-terminethealgorithmiccomplexity)ofthepro-posedalgorithmispurelyandsimplymatrix-by-vectormultiplication,whichcanbeeasilyparal-lelizedonamulti-coreserverforparallelcom-puting.Weimplementandintegrateouralgo-rithmintotheinterfacesandframeworkofthewell-knownLibLinearsoftwaretoolbox.Experi-mentsshowthatouralgorithmiswithstableper-formanceandonaveragefasterthanthestate-of-the-artsolverssuchasSVMperf,PegasosandtheLibLinearthatintegratestheTRON,PCDandDCDalgorithms.1.IntroductionBecausemostareasofscience,simulationsandexperi-mentsareoodedwithbigdata,thereisanurgentneedtodeveloplarge-scaledataclassicationtechniques.Asoneofthemostwidelyusedclassicationmethods,thefastalgorithmtosolveSupportVectorMachines(SVM)isdesired.Givenninstance-labelpairs(xi;yi),1in,xi2d;yi2f1+1g,theL2-normregularizedSVMintheprimalformaimstooptimizethefollowinguncon- Proceedingsofthe31stInternationalConferenceonMachineLearning,Beijing,China,2014.JMLR:W&CPvolume32.Copy-right2014bytheauthor(s).strainedproblem:minw;b1 2wTw+CnXi=1losswTxi+b;yi(1)wherethesupportvectorw2dandtheintercept(i.e.biasterm)b2arethevariables;loss(u;v)isalossfunc-tionmeasuringthedifferencebetweentwoscalarsu2andv2;C2istheweightadjustingtheimpor-tancebetweentheregularizationtermwTwandthelosstermnPi=1losswTxi+b;yi.Ifthelossfunctionisselectedtobethehingelossfunctionloss(u;v)=max(1uv;0),theproblembecomestheL2-normregularizedL1-normlossprimalSVM,a.k.aL1-primalSVMinsomeliterature:minw;b1 2wTw+CnXi=11(wTxi+b)yi+(2)wheretheoperator(u)+def=max(u;0)returnstheinputscalaru2unchangedifuisnon-negative,andzerooth-erwise.Suchnotationisforbetterreadability.Ifthelossfunctionisselectedtobethesquaredhingelossfunctionloss(u;v)=max(1uv;0)2,theproblembe-comestheL2-normregularizedL2-normlossprimalSVM,a.k.aL2-primalSVM:minw;b1 2wTw+CnXi=11(wTxi+b)yi2+(3)PrimalSVMisattractive,partlyduetothefactthatitas-suresacontinuousdecreaseintheprimalobjectivefunction(Keerthi&DeCoste,2005).DesigningfastprimalSVMsolversforlarge-scaledatasetsisahotandimportanttopicinrecentyears:Themethodin(Mangasarian&Musicant,2001)wasproposed,butitneedcomputetheinverseofamatrixwithsizeof(d+1)(d+1)andisslowerthanlaterproposedsolvers.(Mangasarian,2002)and(Keerthi&De-Coste,2005)proposedmodiedNewtonmethodstotrain LinearComputationalCostSolverforPrimalSVM L2-primalSVM.AsEq.(3)isnot2ndorderdifferentiable,toobtaintheNewtondirection,theyhavetousethegen-eralizedHessianmatrix(i.e.generalized2ndorderderiva-tive),whichisnotefcientenough.Toovercomethislimi-tation,aTrustRegiOnNewtonmethod(TRON)(Linetal.,2008)wasproposedtosolveL2-primalSVMandlogisticregression.ThetoolboxSVMperf(Joachims,2006)usedacuttingplanetechniquetosolveL1-primalSVM.(Smolaetal.,2008)appliedbundlemethods,andviewedSVMperfasaspecialcase.(Zhang,2004)proposedaStochasticGra-dientDescent(SGD)methodforprimalSVMwithanytypeoflossfunctions;Pegasos(Shalev-Shwartzetal.,2007)ex-tendedZhang'sworkanddevelopedanalgorithmwhichal-ternatesbetweenstochasticgradientdescentstepsandpro-jectionstepswithbetterperformancethanSVMperf.An-otherstochasticgradientimplementationsimilartoPega-soswaspublishedin(Bottou,2007).MorerecentlytheL2-primalSVMissolvedbythePCD(PrimalCoordinateDescent)algorithm(Changetal.,2008)withcoordinatede-scentmethods.Allalgorithmsdiscussedaboveareiterative,whichupdatewateachiteration.Basedonthisunderstanding,itisnat-urallytondthat,thereisatradeoffbetweenthecompu-tationalcostspentineachiterationandthenumberofit-erationsneeded(Linetal.,2008).Pegasosrandomlysub-samplesafewinstancesatatime,soitscostperiterationislow,butthenumberofiterationsishigh.Contrastively,NewtonmethodssuchastheTRONmethodspendscon-siderableamountoftimeperiteration,butconvergesatfastrates.TheDCD(DualCoordinateDescent)algorithm(Hsiehetal.,2008)bypassestheoperationtomaintaingra-dientsateachiterationinthePCD(Changetal.,2008),andlowersthealgorithmiccomplexityperiterationfromO(nd)toO(d)inlinearSVMcases,wheredistheaver-agenumberofnonzeroelementsperinstance.However,suchreductionincomplexitydoesnotapplyfornonlinearSVMwithkernels.Moreover,asshowninourexperiments(seex4),theconvergenceoftheDCDmaybeextremelylengthyforsomedatasets.Inlarge-scalescenarios,usu-allyanapproximatesolutionoftheoptimizationproblemisenoughtoproduceagoodmodel(Linetal.,2008;Changetal.,2008;Hsiehetal.,2008).Thus,methodswithalow-costiterationarepreferredastheycanquicklygenerateareasonablemodel.However,ifonespeciesanunsuitablestoppingcondition,suchmethodsmayfallintothesitua-tionoflengthyiterations.Toaddresstheseissues,wepro-poseanewSVMsolverusingAugmentedLagrangeMulti-pliers(ALM)withsimplematrix-by-vectormultiplicationperiteration,linearcomputationalcost,andprovablecon-vergence.Therestofthemanuscriptisorganizedasfollows:Section2presentsourproposedalgorithm;Section3analyzesitsoptimalityandconvergence;experimentalresultsaregiveninSection4;Section5containstheconcludingremarks.2.ProposedAlgorithmForeaseofnotationandbetterextensibility,werstunifyEq.(2)andEq.(3),andgeneralizethemtominimizetheob-jectivefunctionoftheLp-primalSVM:obj(w;b)=1 2wTw+CnXi=11(wTxi+b)yip+(4)wherep2isaconstantandtypically1p2forbe-ingmeaningful.AfundamentaldifcultyinbothL2-andL1-primalSVMisthat,theirlossfunctions(i.e.hingelossandsquaredhingeloss)ispiecewise.Withthisob-servationandthetrick1(wTxi+b)yi=yiyi(wTxi+b)yi=yi(yi(wTxi+b)),weintroduceauxiliaryvari-ablesei=yi(wTxi+b),1in,andtheminimizationofEq.(4)becomes:minw;b;ei=yi(Txi+b)1 2wTw+CnXi=1(yiei)p+(5)BasedonALM(Gill&Robinson,2012),Eq.(5)isturnedintoanunconstrainedoptimizationproblembystudyingtheLagrangianfunctionofEq.(5)below:L(w;b;e;)=1 2wTw+CnXi=1(yiei)p++T(XTw+1by+e)(6)whereXdn=[x1,x2...;xn],1n1=[1,1,...1]T,yn1=[y1,y2,...,yn]T,en1=[e1,e2,...,en]T.Thelasttermisthepointwisemultiplicationoftheamountofviolationofthenconstraints(wTxi+b)yi+ei=0withthevector2nconsistingofnLagrangianmultipliers.ALMadds 2\r\rXTw+1by+e\r\r2(supplementalterm)toEq.(6):as“augments”toinnity,thistermforcesthenconstraintstobesatised.SotheaugmentedLagrangianfunctionofEq.(5)isdenedas:AL(w;b;e;;)=1 2wTw+CnPi=1(yiei)p++T(XTw+1by+e)+ 2XTw+1by+e2(7)ArrangingthelasttwotermsinEq.(7)intoaquadraticformleadsto:AL(w;b;e;;)=1 2wTw+CnPi=1(yiei)p++ 2XTw+1by+e+ 2(8)NotethatfromEq.(7)toEq.(8),weaddaterm2 2thatre-mainsconstantwhenperformingoptimizationviathevari-ablesw,bandewithinasingleiteration.As!1,thistermisveryclosetozeroandthusnegligibleeventually.Atthek-thiteration,similartotheIterativeThresholding(IT)method(Wrightetal.,2009),theamountofviolation LinearComputationalCostSolverforPrimalSVM ofthenconstraintsisusedtoupdatetheLagrangianmulti-pliervector:(k)=(k1)+(k)(XTw+1by+e)(9)Theaugmentedpenaltyparameterismonotonicallynon-decreasingovertheiterativeprocess.Howtodeterminetheaugmentedpenaltyparameterseries(k)foreveryiterationkwillbediscussedinx3.Weomitthebracketedsubscriptkwhenthereisnorisktocauseanyconfusion,andactu-allyweonlyneedasinglevariable(orarray)tostorethesescalars(orvectors)foralgorithmicimplementation.Notethatthesymbolsofsubscriptsandbracketedsubscriptsre-fertoquitedifferentmeaningsinthispaper.Remarks:AmeritoftheALMisthattheoptimalstepsizetoupdateisproventobethechosenpenaltyparameter(k),makingtheparametertuningmucheasierthanthatoftheIT(Gill&Robinson,2012).Nowateachiteration,wecansplittheupdatingofw;b;eintotwoindependentportions:minimizingewithw;bxedandminimizingw;bwithexed.Whenw;barexed,thetermwTwisconstantandEq.(8)canbedecomposedintonindependentsingle-variableminimizationproblemsw.r.t.ei:ei=argmineiFi(ei)=argmineiC(yiei)p++ 2ei(yiwTxibi )2=argminei\r(yiei)p++1 2(eiti)2(10)where\r=C ,iisthei-thelementof,ti=yiwTxibi isaconstant.SolvingEq.(10)iseasyaseiisthemin-imizerforthesingle-variable2-piecepiecewisefunctionFi(ei),sowejustneedtonditsminimawhenei0andei�0separatelyandpickthesmallerone.Whenyiei0,(yiei0)p+=0,soweonlyneedtopickthesmallerbetweenFi(0)andFi(ti).Whenyiei�0,weneedtosolvetheequa-tion:@Fi @ei=p\r(yiei)p1+eiti=0(11)Forarbitrarygivenpand\r,solvingEq.(11)isdifcult.Butfortunatelyinourscenario,itisalwaysthatp1,\r�0.So@Fi @eiismonotonicallyincreasingw.r.t.ei,andwecanusethewell-knownbinarysearchmethodtonarrowthepossi-blerangeofeibyhalfviaeachoperation,andobtainan"-accuratesolutioninO(log1=")time.Particularly,wecanwritetheexplicitsolutionstraightforwardlywhenp=1or2.ForL1-primalSVM(p=1),yi=1:ei=tiyi\rwhenyiti�\rei=0when0yiti\rei=tiwhenyiti0(12)ForL2-primalSVM(p=2),yi=1:ei=ti=(1+2\r)whenyiti&#x-5.9;╆0;ei=tiwhenyiti0(13)Wheneisxed,thetermPni=1(yiei)p+isconstant,andEq.(8)becomesanL2-normregularizedLeastSquareRe-gression(LSR)problem:G(w;b)=minw;b1wTw+\r\rXTw+1bz\r\r2(14)wherez=ye1isaconstantvector.Eq.(14)canbeturnedintoastandardLSRproblemasbelow,ifwesetv=wb,A=XT11 2I0andd=z0G(w;b)=G(z)=minzkAzdk2(15)whichcanberesolvedbymanystandardlibrariessuchasthedefaultLSQRfunctioninMATLAB.HerebywenishtheillustrationoftheproposedexactSVM-ALMalgorithmandsummarizedetailsinAlgorithm1.Tobecompatiblewiththeexistingmethodssuchas(Fanetal.,2008;Changetal.,2008;Hsiehetal.,2008;Linetal.,2008),thestoppingconditionissettobekrobj(w;b)k,wheretheuser-speciedparameteris0.01bydefault,androbj(w;b)isthegradientsofobj(w;b)w.r.t.w.However,theLSQRusedherecostsO(nd2)wheredistheaveragenumberofnonzeroelementsperinstance,whichisascostlyascomputingmatrixinverse.Thisistooexpen-siveasweneedtoaffordsuchcomputationeveryiteration.Drivenbythisconsiderationandthetradeoffbetweencostperiterationandthenumberofiterationsasdiscussedintheintroduction,weuseanoptimalstep-sizegradientmethodtoupdatewandbateachiteration.ThegradientsofG(w;b)w.r.t.wandbareasbelow:wg=@G @w=X(XTw+1bz)+1w;bg=@G @b=nb+1T(XTwz)(16)Findingtheoptimalstep-sizesisasingle-variablequadraticfunctionminimizationproblem:mins1(wswg)T(wswg)+kXT(wswg)+1(bsbg)zk2(17)whichhastheexplicitsolutions=(XTwg+1bg)T(XTw+1bz)+1wTgw (XTwg+1bg)T(XTwg+1bg)+1wTgwg=wTgwg+bTgbg (XTwg+1bg)T(XTwg+1bg)+1wTgwg(18)Thelastequalityisjusttosimplifythecomputationofs,andcanbeveriedviasubstitutingwgandbgintwode-nominatorswithEq.(16).Wepreferthesimpliedformula,becauseitsavestwomatrix-by-vectormultiplications. LinearComputationalCostSolverforPrimalSVM Algorithm1ExactSVM-ALMforLp-primalSVM Input:p;X;y;(1);(2);:::;(1)Initializew=1;b=0;=0.repeat1.UpdateewithEq.(11)orEq.(12)orEq.(13).2.Updatew;busingtheLSQRwithEq.(15).3.UpdatewithEq.(9).untilkrobj(w;b)k Algorithm2InexactSVM-ALMforLp-primalSVM Input:p;X;y;(1);(2);:::;(1)Initializew=1;b=0;=0.repeat1.UpdateewithEq.(11)orEq.(12)orEq.(13).2.Updatewbywswg,updatebbybsbg,wherewg;bg;sarecomputedwithEq.(16)andEq.(18),re-spectively.3.UpdatewithEq.(9).untilkrobj(w;b)k WesummarizetheproposedinexactSVM-ALMalgorithminAlgorithm2.Ateachiteration,Algorithm2onlyneedsthreematrix-by-vectormultiplicationswithcom-plexityO(nd),wheredistheaveragenumberofnonzeroelementsperinstance.TheseveralpointwiseadditionandmultiplicationbetweentwovectorsarewithcomplexityeitherO(d)orO(n),andcanbeneglectedcomparedtoO(nd).Inlarge-scaledataclassications,thehighdimen-sionalfeaturesarealwaysreducedbytheprescreeningpro-cedure,hencedisnotlarge.Ournewalgorithmhaslinearcomputationalcostw.r.t.thenumberofdatainstancesn.3.ConvergenceandOptimalityWerstprovesomelemmas.Lemma1LetHbearealHilbertspaceendowedwithaninnerproduct&#x-5.9;╆andacorrespondingnormkk,andv2kuk,where@f(u)isthesubgradientoff(u).Thenkvk=1ifu=0,andkvk1ifu=0,wherekkisthedualnormofkk.Proof:Becausev2kuk,kdkkukv;du.11;ᜀ;8d2H(19)Ifu=0,settingd=02uleadstokuk=v;u&#x-5.9;╆kukkvk(20)Thuswehavekvk1.Ontheotherside,wehavekdukkdkkukv;du.11;ᜀ;8d2H(21)whichleadstov;du kduk�18d=u(22)Sokvk1.Thenitcanbeconcludedthatkvk=1.Ifu=0,thenEq.(19)isequivalenttov;d&#x-5.9;╆18kdk=1(23)Sokvk1bythedenitionofthedualnorm. Lemma2Thesequencef(k)gineitherAlgorithm1orAlgorithm2isbounded.Proof:Fromw(k)=argminw;bAL(w;b;e(k);(k1);(k)),b(k)=argminw;bAL(w;b;e(k);(k1);(k)),e(k)=argmineAL(w(k1);b(k1);e;(k1);(k)),wehave:02AL(w(k);b(k);e(k);(k1);(k))02bAL(w(k);b(k);e(k);(k1);(k))02eAL(w(k);b(k);e(k);(k1);(k))(24)whichindicate:02kw(k)k2(k1)(k)(XTw+1by+e)(25)02kC(yTe(k))+kp(k1)(k)(XTw+1by+e)Therefore(k)2kw(k)k2;(k)2kC(yTe(k))+kp(26)AccordingtoLemma1,thesequencef(k)ginAlgorithm1isbounded,becauseofthefactthat,thedualnormsofkk2andkkparekk2andkkp p1(Linetal.,2009),re-spectively.Theboundednessoff(k)ginAlgorithm2canbeprovedinthesameway. Lemma3Thesequencesfw(k)g,fb(k)g,fe(k)ginei-therAlgorithm1orAlgorithm2areallbounded,ifkw(k+1)k2+kC(yTe(k+1))+kp+05(k+1)kXTw(k+1)+1b(k+1)y+e(k+1)k2kw(k)k2+kC(yTe(k))+kp+05(k)kXTw(k)+1b(k)y+e(k)k2foreveryk�0and1Pk=1(k+1) 2(k)1.Proof:Askwk2+kC(yTe)+kp+05kXTw+1by+ek2isnon-increasingasAlgorithm1iterates,itcanbeveriedthat,AL(w(k);b(k);e(k);(k1);(k))AL(w(k1);b(k1);e(k1);(k2);(k1))+052(k1)((k1)+(k))k(k1)(k2)k2(27)Theaboveinequalitycanbederivedviasub-stitutingwithEq.(9)toeliminate(k1).SofAL(w(k);b(k);e(k);(k1);(k))gisupperbounded,owingtotheboundednessoff(k)gand1Pk=1(k)+(k+1) 2(k)1Pk=12(k+1) 2(k)1. LinearComputationalCostSolverforPrimalSVM Thus,wehavekw(k)k2+kC(yTe(k))+kp=AL(w(k);b(k);e(k);(k1);(k))k(k)k2 2(k)(28)asupperbounded.Thereforefw(k)g,fe(k)ginAlgorithm1arebothbounded,whichleadstotheboundednessoffb(k)g,asXTw+1by+e=0.Itcanbeveriedthat,exactlythesamepropertiesholdinAlgorithm2. Thenon-increasingrequirementofkwk2+kC(yTe)+kp+05kXTw+1by+ek2inLemma3alsoimpliesthewaytogeneratethesequencef(k)gbysettingtheupperlimitof(k):(k+1)=(0:5(k)kXTw(k)+1b(k)y+e(k)k2+kw(k)k2kw(k+1)k2+kC(yTe(k))+kpkC(yTe(k+1))+kp)(0:5kXTw(k+1)+1b(k+1)y+e(k+1)k2)(29)BecauseofEqs.(10,14),wehaveAL(w(k);b(k);e(k);(k1);(k))AL(w(k1);b(k1);e(k);(k1);(k))AL(w(k1);b(k1);e(k1);(k1);(k))whichensuresthatf(k)gisnon-decreasing.Owingtoprecisionlimit,cannotincreasetoinn-ityinpracticalimplementationsofbothAlgorithm1andAlgorithm2,otherwisethesignicantdigitsoftheterms1 2wTwandCnPi=1(yiei)p+inAL(w;b;e;;)wouldbesqueezedoutbytheextremelylargeterm 2kXTw+1by+e+ k2.Morespecically,hasaupperlimitof105asanimplementationdetail.Wefol-lowtheconventionofmostexistingworkbyusingdouble-precisionoating-pointnumbers.Usingsingleprecisione.g.(Bottou,2007)mayreducethecomputationaltimeinsomesituations,butthissettingmaycausenumericalin-accuracy(Changetal.,2008).AnadvantageoftheALMisthatitconvergestotheexactoptimalsolutionbeforeaugmentstoinnity(Gill&Robinson,2012).Incontrast,strictlyspeakingtheIT(Wrightetal.,2009)onlyndsap-proximatesolutions.Nowwehavecometothemainresultsofthissec-tion.Theorem1Thesolutionconsistingofthelimitofthesequencesfw(k)g,fb(k)g,fe(k)ginAlgorithm1withEq.(29)forupdating,say(w(1),b(1),e(1)),isanop-timalsolutiontotheLp-primalSVMproblemandtheconvergencerateisatleastO(1(k))inthesensethatjkw(k)k2+kC(yTe(k))+kpobj=O(1(k)),whereobjistheminimalvalueofobjinEq.(4).Proof:AsthevitalnaturalpropertyofanALMalgorithm,thefollowingistrue:AL(w(k);b(k);e(k);(k1);(k))=minw;b;eAL(w;b;e;(k1);(k))minw;b;e;XTw+1by+e=0AL(w;b;e;(k1);(k))=minw;b;e;XTw+1by+e=0kwk2+kC(yTe)+kp+k(k1)k2 2(k)=minw;bkwk2+kC(1(XTw+1b)y)+kp+k(k1)k2 2(k)=obj+k(k1)k2 2(k)(30)Therstequalityandsecondinequalityareobvious;thethirdequalityisbecauseofthefactthat,whenthecon-straintsw.r.ttheauxiliaryvariableseissatised,thelastterminEq.(8)degeneratesto(k1)2=2(k);thefourthequalityisobtainedjustbysubstitutingtheconstraints,similartotheconversionfromEq.(5)toEq.(4);thefthequalityisaccordingtothedenitioninEq.(4).InAlgorithm1,itcanbeveriedthat,kw(k)k2+kC(yTe(k))+kp=AL(w(k);b(k);e(k);(k1);(k))(k)2 2(k)(31)BasedonEq.(30)wehavekw(k)k2+kC(yTe(k))+kpobj+k(k1)k2 2(k)k(k)k2 2(k)Theprovedboundednessoff(k)ginLemma2leadsto:objO(1(k))kw(k)k2+kC(yTe(k))+kpobj+O(1(k))Notethattherange[objO(1(k));obj+O(1(k))]isderived,asthetermO(1(k))maybeeitherpositiveorneg-ative.Herebytheconvergencerateisproved.Whenk!1,O(1(k))isnegligible,sokw(1)k2+kC(yTe(1))+kpobj(32)AccordingtoEq.(9),theconstraintsXTw(k)+1b(k)y+e(k)=1(k)((k)(k1))aresatisedwhenk!1:XTw(1)+1b(1)y+e(1)=0(33)Therefore,(w(1),b(1),e(1))isanoptimalsolutiontotheLp-primalSVMproblem. Theorem2Thesolutionconsistingofthelimitofthese-quencesfw(k)g,fb(k)g,fe(k)ginAlgorithm2withEq.(29)forupdating,say(w(1),b(1),e(1)),isanoptimalsolu-tiontotheLp-primalSVMproblem,if1Pk=1(k+1) 2(k)1andlimk!1(k)(e(k+1)e(k))=0.Notethat,unlikeTheorem1fortheexactALMmethod,theabovestatementonlyguaranteesconvergencebutdoes LinearComputationalCostSolverforPrimalSVM notspecifytherateofconvergencefortheinexactALMmethod.Althoughtheexactconvergencerateoftheinex-actALMmethodisdifculttoobtainintheory,extensivenumericalexperimentshaveshownthatforgeometricallyincreasing,itstillconvergesQ-linearly(Gill&Robin-son,2012;Linetal.,2009).Proof:OurproofhereisbasedonTheorem1bycompar-ingthedifferenceoffw(k)g,fb(k)g,fe(k)gandf(k)ginAlgorithm1andAlgorithm2.Fordistinctionpurpose,wedenotefw(k)g,fb(k)g,fe(k)gandf(k)ginAlgorithm1asf^w(k)g,f^b(k)g,f^e(k)gandf^(k)grespectively,inthisproof.AccordingtoXTw(k)+1b(k)y+e(k)=1(k)((k)(k1))fromEq.(9)andtheboundednessoff(k)g,wehavelimk!1XTw(k)+1b(k)y+e(k)=0(34)So(w(k);b(k);e(k))approachesafeasiblesolution.Fur-ther,theboundednessoff(k)gandf^(k)gleadsto:ke(k+1)e(k)k=O(1(k)k^(k+1)(k+1)k)=O(1(k))Since1Pk=11(k)1Pk=1(k)+(k+1) 2(k)1Pk=12(k+1) 2(k)1,e(k)isaCauchysequence,andhasalimite(1).ThenwithEq.(34),w(k)andb(k)alsohavetheircorrespondinglimitsw(1)andb(1).So(w(1);b(1);e(1))isafeasiblesolu-tion.Ontheotherside,wehavetheoptimalitycondition:(k)2kw(k)k2;(k)2k(yTe(k))+kp(35)Thus,bytheconvexityofnorms(for1p2)wehave:kw(k)k2+C(yTe(k))+pobj^(k)^w(k)w(k)&#x-5.9;╆(k)^e(k)e(k)&#x-5.9;╆=obj1(k)(k);(k)(k1)&#x-5.9;╆+1(k)(k)^(k)^(k1).11;ᜀ(k)(e(k)e(k1))^w(k)w(k).11;ᜀ(36)Thesecondandthirdtermsapproachtozeroduetotheboundednessoff(k)gandf^(k)g.Thelasttermtendstovanishduetotheboundednessoffw(k)gandf^w(k)gtogetherwiththeassumptionlimk!1(k)(e(k+1)e(k))=0.Sowhenk!1,Eq.(36)becomeskw(1)k2+kC(yTe(1))+kpobj(37)So(w(1);b(1);e(1))isanoptimalsolutiontotheLp-primalSVMproblem. 4.ExperimentsThispaperfollowstheconceptsofreproducibleresearch.Allresultspresentedinthemanuscriptarereproducibleusingthecodeandpublicdatasetsavailableonlineathttps://sites.google.com/site/svmalm.Allexperimentsareconductedonan8-coreIntelXeonX54603.16GHz(12MCache,1333MHzFSB)Linuxserverwith32Gmemory.Forallexperimentsexceptinx4.3,weusethedefaultvalue=0.01asinLibLinear.Weterminatethealgorithmswhentheobjectives'changesarelessthan104.Inourmethod,weempiricallysetthemaximumiterationnumberas100,becauseinallourexperimentsouralgorithmconvergeswithin100iterations.Weuse7popularlyadoptedbenchmarkdatasetsfromvar-ioussourcesforperformanceevaluations:UCIForest(Collobertetal.,2002)(n=581012;d=54),ijcnn1(Chang&Lin,2001)(n=191681;d=22),Webpage(Platt,1999)(n=64700;d=300),UCIConnect-4(Frank&Asuncion,2010)(n=67557;d=126),SensITVehicle(acoustic/seismic)(Duarte&Hu,2004)(bothn=98528;d=50),Shuttle(Hsu&Lin,2002)(n=58000;d=9),UCIPoker(Frank&Asuncion,2010)(n=1025010;d=10),Epsilon(Sonnenburgetal.,2008)(n=500000;d=2000).TheEpsilondatasethasverydensefeaturesandwasusedinmanypre-viouslarge-scaledataclassications.Theve-foldcrossvalidationisconducted(exceptinx4.3whenallsamplesareusedfortraining)asin(Changetal.,2008).Formulti-classclassication,wefollowthedefaultone-versus-the-reststrategyin(Chang&Lin,2011)and(Fanetal.,2008),andsimplyrelyontheexistingmodulesintheLibLinearsoftwaretoolbox.Theaveragetrainingtimeisreported.4.1.HowDoesTrainingTimeVarieswithn?Fig.1showslog-logplotsofhowtheCPU-timeusedfortrainingincreaseswithrespectton,thenumberoftrainingsamples.Becausewhennissmallthetrainingtimeistooshorttobemeasuredaccurately,weruneachtestfor10timesandreportthetotaltrainingtimeinFig.1.Linesinalog-logplotcorrespondtopolynomialgrowthO(nl),wherelcorrespondstotheslopeoftheline.ItisseenfromFig.1that,thetrainingtimeofboththeexactSVM-ALMandtheinexactSVM-ALMisroughlylinearwithrespectton,sincetheslopesofthelinesrepresent-ingvariousdatasetsareverycloseto1.Togetherwiththetheoreticalanalysisinx2thatoneiterationoftheinexactSVM-ALMalgorithmcostsO(nd),Algorithm2isshowntobealinearcomputationalcostsolverfortheLp-primalSVM.Notethatanadvantageofouralgorithmsisthat,thetrain-ingtime(andobviouslythetestingtimeaswell)iscom-pletelyirrelevantwithweightCandnormp. LinearComputationalCostSolverforPrimalSVM 102 103 104 105 10-1 100 101 102 103 Number of Training SamplesTraining Time (Seconds)Solving L1-primal SVM by the Inexact SVM-ALM UCI Forest ijcnn1 Webpage UCI Connect-4 SensIT Vehicle (acoustic) SensIT Vehicle (seismic) Shuttle UCI Poker Epsilon 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L1-primal SVM by the Exact SVM-ALM 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L2-primal SVM by the Inexact SVM-ALM 102 103 104 105 10-1 100 101 102 103 104 105 Number of Training SamplesTraining Time (Seconds)Solving L2-primal SVM by the Exact SVM-ALM Figure1.TrainingtimeoftheproposedexactSVM-ALM(Algorithm1)andinexactSVM-ALM(Algorithm2)asafunctionofn.4.2.PredictionAccuracyComparisonbetweenExactandInexactSVM-ALMAlgorithmsAnaturaldrawbackoftheInexactSVM-ALMAlgorithmisthatitstillrequiresaugmentstoinnityforobtain-ingtheexactoptimalsolution,asanalyzedintheproofofTheorem2.ThispropertyissimilartotheITalgorithms(Wrightetal.,2009).However,owingtoprecisionlimitasdiscussedinx2,cannotincreasetoinnityinpracti-calimplementationsoftheInexactSVM-ALMAlgorithm2.SoapotentialconcernisthatthespeedupoftheInexactSVM-ALMovertheExactSVM-ALMcomesattheex-penseofpredictionaccuracies,butthisisnotthecaseinfact,asveriedexperimentallyinthissubsection.Fig.2showsthedifferenceintermsofpredictionaccuracybetweentheclassicationmodelsproducedbytheinexactSVM-ALMandtheexactSVM-ALM.Forbetterreadabil-ity,theaxisofCisplottedinlog-scale,andthedifferenceisshownintermsofpercentagepoints.ApositivevalueindicatesthattheinexactSVM-ALMhashigherpredictionaccuracy,whileanegativevalueindicatesthattheexactSVM-ALMperformsbetter.ForalmostallvaluesofCbothalgorithmsperformalmostidentically.Inparticular,thereisnoindicationthatthemodelslearnedbytheinex-actSVM-ALMarelessaccurate.Contrarily,thepredictionaccuracyoftheinexactSVM-ALMmaybeslightlybetterthanthatoftheexactSVM-ALM,andsuchphenomenaisreasonablebecauseithasbeenreportedthatsomeimple-mentationsofSVMsolversachievehigheraccuracybeforetheobjectivefunctionreachesitsminimal(Changetal.,2008).4.3.TrainingTimeComparisonTheproposedAlgorithm2iscomparedwiththestateoftheartsolversSVMperf,Pegasos,BMRM(BundleMethodforRegularizedRiskMinimization)(Teoetal.,2010)andtheLibLinearthatintegratestheTRON,PCDandDCDalgo-rithms.TheL1-primalSVMcannotbesolvedbythePCD(Changetal.,2008),becauseitsobjectivefunctionEq.(2)isnon-differentiable.ThusthePCDismissingfromthetestfortheL1-primalSVM.Asaconvention(Joachims,2006)(Shalev-Shwartzetal.,2007)(Changetal.,2008)(Hsiehetal.,2008)(Linetal.,2008),SVMperf,PegasosandtheTRONmethodaretypicallyonlytestedfortheL1-primalSVM.BecausetheTRON,PCDandDCDalgorithmsdonotsup-portthebiastermb,weextendeachinstancebyanad-ditionaldimensionwithalargeconstantT=103,asin-structedin(Hsiehetal.,2008;Linetal.,2008).AslongastheconstantTintheadditionaldimensionissufcientlylarge,suchconversionisequivalenttosupportingthetrain-ingofthebiastermb.Withthesamesettingsasin(Changetal.,2008)(Hsiehetal.,2008)wecomparetheL1-SVMandL2-SVMsolversintermofthetrainingtimetoreducetheobjectivefunctionobj()suchthattherelativedifferenceofobjtotheopti-mumobj,(objobj)=objiswithin0.01.Inordertoobtainthereferencesolutions,werunTRONwiththestoppingconditionrobj(w)001.Sincetheobjectivefunctionsarestableundersuchstrictstoppingconditions,thesesolutionsareseentobeveryclosetotheground-truthoptima.TheresultsarelistedinTables2and3,fromwhichitisseenthat,theproposedalgorithmiswithstableperfor-manceandonaveragefasterthanitscompetitors.Thead-vantageoftheproposealgorithmismoreobviousforlargedatasets,suchastheUCIForest,SensITVehicle,andUCIPokerdatasets.TheDCDalgorithmisnotstable,asitmaygetstuckatsometestcasesbutconvergesextremelyfastatothertestcases.Whenthedimensionalityoffeaturesin-creasesto2000astheEpsilondata,ouralgorithmstillper-formswell,andisthefastestsolverforL1-SVMandthesecondfastestsolverforL2-SVM.4.4.TheOptimalpforLp-PrimalSVMAnaturaladvantageofourproposedalgorithmsisthat,itcansolvetheprimalSVMwithLp-normlossfunctionsforanyp1.Itisnotdifculttounderstandthefactthat,itshouldbecoincidentalforeitherp=1orp=2tomakethepredictionaccuracyoftheLp-primalSVMthehighestamongallpossiblepvalues. LinearComputationalCostSolverforPrimalSVM 100 101 102 103 104 105 106 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 CPrediction accuracy difference(%)L1-primal SVM UCI Forest ijcnn1 Webpage UCI Connect-4 SensIT Vehicle (acoustic) SensIT Vehicle (seismic) Shuttle UCI Poker Epsilon 100 101 102 103 104 105 106 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 CPrediction accuracy difference(%)L2-primal SVM Figure2.PredictionaccuracydifferencebetweentheinexactSVM-ALM(Algorithm2)andtheexactSVM-ALM(Algorithm1)forL1-primalandL2-primalSVMsasafunctionofC.Table1.Thetrainingtime(seconds)foranL1-SVMsolvertore-duceobj()towithin1%oftheoptimalvalue.Thoughthetrain-ingtimeoftheproposedalgorithmsisirrelevantwithC,thetrain-ingtimeofSVMperf,TRON,PCDandDCDmaybeaffectedbyC.Following(Changetal.,2008)and(Hsiehetal.,2008),wesetC=1forfaircomparison.Thetrainingtimeismeasuredandaveragedover10runs.Thesolverwiththeshortestrunningtimeisboldfaced. DATASETOURPEGASOSSVMperfDCDBMRM FOREST4.174.1139.2�50051.8IJCNN13.287.9105.67.863.5WEBPAGE4.638.362.13.630.2CONNECT-42.654.2122.6�50042.9SENSIT(A)3.9128.7399.817.0102.5SENSIT(S)3.9109.3335.911.185.2SHUTTLE1.229.666.62.220.6POKER4.9107.4303.1�50080.6EPSILON31.1396.4�50093.2315.2 Thusweconductaninterestingexperimentshowingthisphenomenon.BecauseexistingSVMsolverscannotsolvetheLp-primalSVMforp=1or2,webelievethatwearethersttoreportsuchresultsinTable3.5.ConclusionThispaperproposedanovellinearcomputationalcostpri-malSVMsolverusingtheALMalgorithmforboththeL1-normandtheL2-normlossfunctions.Toavoidthedif-cultyofdealingwithpiecewiselossfunctions,anauxiliaryvectorisintroducedsuchthatineachiteration,theauxiliaryvectorandthesupportvectorarealternativelyoptimizedwiththedirectionofLagrangemultipliers.Inextensiveexperiments,ourapproachisconsistentlyfasterthanotherstate-of-the-artsolvers.Fromthemethodologicalperspec-tive,theproposedalgorithmisnovelandtotallydifferentfromexistingliteratures.Table2.Thetrainingtime(seconds)foranL2-SVMsolvertore-duceobj()towithin1%oftheoptimalvaluewhenC=1,thesameasinTable1.Thetrainingtimeismeasuredandaveragedover10runs.Thesolverwiththeshortestrunningtimeisbold-faced. DATASETOURTRONPCDDCDBMRM FOREST3.992.310.0�50050.6IJCNN13.27.73.47.564.2WEBPAGE4.42.20.93.932.1CONNECT-42.710.43.9�50039.7SENSIT(A)3.927.75.317.599.8SENSIT(S)3.728.14.910.986.1SHUTTLE1.23.60.92.421.1POKER5.159.77.1�50079.8EPSILON32.6241.916.983.2329.2 Table3.PredictionaccuracyofL1-SVM,L2-SVMandLp-SVM,wherepistunedbytryingtheparametersetf1,1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8,1.9,2g. DATASETL1-SVML2-SVMLp-SVMp FOREST68.1%65.3%71.0%1.3IJCNN167.3%74.2%74.6%1.9WEBPAGE57.3%59.7%63.4%1.6CONNECT-449.3%44.9%51.8%1.2SENSIT(A)43.5%45.9%47.3%1.8SENSIT(S)41.6%42.4%46.8%1.6SHUTTLE35.9%29.7%36.1%1.1POKER31.5%33.8%36.9%1.7EPSILON42.9%40.3%44.6%1.4 AcknowledgmentsCorrespondingAuthor:HengHuang(heng@uta.edu)ThisworkwaspartiallysupportedbyUSNSFIIS-1117965,IIS-1302675,IIS-1344152.ReferencesBottou,Leon.Stochasticgradientdescentexamples,2007.http://leon.bottou.org/projects/sgd.Chang,Chih-ChungandLin,Chih-Jen.Ijcnn2001chal- LinearComputationalCostSolverforPrimalSVM lenge:Generalizationabilityandtextdecoding.InIJCNN,pp.1031–1036,2001.Chang,Chih-ChungandLin,Chih-Jen.LIBSVM:Alibraryforsupportvectormachines.ACMTIST,2(3):27:1–27:27,2011.Softwareavailableathttp://www.csie.ntu.edu.tw/cjlin/libsvm.Chang,Kai-Wei,Hsieh,Cho-Jui,andLin,Chih-Jen.Coor-dinatedescentmethodforlarge-scaleL2-losslinearsvm.JMLR,9:1369–1398,2008.Collobert,R.,Bengio,S.,andBengio,Y.Aparallelmixtureofsvmsforverylargescaleproblems.NeuralComputa-tion,14(5):1105–1114,2002.Duarte,M.andHu,Y.H.Vehicleclassicationindis-tributedsensornetworks.JPDC,64(7):826–838,2004.Fan,Rong-En,Chang,Kai-Wei,Hsieh,Cho-Jui,Wang,Xiang-Rui,andLin,Chih-Jen.Liblinear:Alibraryforlargelinearclassication.JMLR,9:1871–1874,2008.Frank,A.andAsuncion,A.Ucimachinelearningreposi-tory,2010.http://archive.ics.uci.edu/ml.Gill,PhilipE.andRobinson,DanielP.Aprimal-dualaugmentedlagrangian.ComputationalOptimizationandApplications,51(1):1–25,2012.Hsieh,Cho-Jui,Chang,Kai-Wei,Keerthi,S.Sathiya,Sun-dararajan,S.,andLin,Chih-Jen.Adualcoordinatede-scentmethodforlarge-scalelinearsvm.InICML,pp.408–415,2008.Hsu,Chih-WeiandLin,Chih-Jen.Acomparisonofmeth-odsformulti-classsupportvectormachines.IEEETNN,13(2):415–425,2002.Joachims,Thorsten.Traininglinearsvmsinlineartime.InKDD,pp.217–226,2006.Keerthi,S.SathiyaandDeCoste,Dennis.Amodied-nitenewtonmethodforfastsolutionoflargescalelinearsvms.JMLR,6:341–361,2005.Lin,Chih-Jen,Weng,RubyC.,andKeerthi,S.Sathiya.Trustregionnewtonmethodforlarge-scalelogisticre-gression.JMLR,9:627–650,2008.Lin,Zhouchen,Chen,Minming,Wu,Leqin,andMa,Yi.Theaugmentedlagrangemultipliermethodforexactre-coveryofcorruptedlow-rankmatrices.2009.Mangasarian,O.L.andMusicant,DavidR.Lagrangiansupportvectormachines.JournalofMachineLearningResearch,1:161–177,2001.Mangasarian,OlviL.Anitenewtonmethodforclassi-cation.OptimizationMethodsandSoftware,17(5):913–929,2002.Platt,JohnC.Fasttrainingofsupportvectormachinesusingsequentialminimaloptimization.InAdvancesinKernelMethods:SupportVectorLearning,pp.185–208,1999.Shalev-Shwartz,Shai,Singer,Yoram,andSrebro,Nathan.Pegasos:primalestimatedsubgradientsolverforsvm.InICML,pp.807C814,2007.Smola,AlexJ.,Vishwanathan,S.V.N.,andLe,Quoc.Bundlemethodsformachinelearning.InNIPS,pp.1377–1384,2008.Sonnenburg,Soeren,Franc,Vojtech,Yom-Tov,Elad,andSebag,Michele.Pascallargescalelearningchallenge,2008.http://largescale.ml.tu-berlin.de.Teo,ChoonHui,Vishwanathan,S.V.N.,Smola,Alex,andLe,QuocV.Bundlemethodsforregularizedriskmini-mization.JMLR,11:311–365,2010.Wright,John,Ganesh,Arvind,Rao,Shankar,andMa,Yi.Robustprincipalcomponentanalysis:exactrecoveryofcorruptedlow-rankmatricesbyconvexoptimization.InNIPS,pp.2080–2088,2009.Zhang,Tong.Solvinglargescalelinearpredictionprob-lemsusingstochasticgradientdescentalgorithms.InICML,pp.919–926,2004.