QihangLinUniversityofIowaIowaCityIAUSAqihanglinuiowaeduZhaosongLuSimonFraserUniversityBurnabyBCCanadazhaosongsfucaLinXiaoMicrosoftResearchRedmondWAUSAlinxiaomicrosoftcomAbstractWedevelop ID: 485747
Download Pdf The PPT/PDF document "AnAcceleratedProximalCoordinateGradientM..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
AnAcceleratedProximalCoordinateGradientMethod QihangLinUniversityofIowaIowaCity,IA,USAqihang-lin@uiowa.eduZhaosongLuSimonFraserUniversityBurnaby,BC,Canadazhaosong@sfu.caLinXiaoMicrosoftResearchRedmond,WA,USAlin.xiao@microsoft.comAbstractWedevelopanacceleratedrandomizedproximalcoordinategradient(APCG)method,forsolvingabroadclassofcompositeconvexoptimizationproblems.Inparticular,ourmethodachievesfasterlinearconvergenceratesforminimizingstronglyconvexfunctionsthanexistingrandomizedproximalcoordinategradi-entmethods.WeshowhowtoapplytheAPCGmethodtosolvethedualoftheregularizedempiricalriskminimization(ERM)problem,anddeviseefcientim-plementationsthatavoidfull-dimensionalvectoroperations.Forill-conditionedERMproblems,ourmethodobtainsimprovedconvergenceratesthanthestate-of-the-artstochasticdualcoordinateascent(SDCA)method.1IntroductionCoordinatedescentmethodshavereceivedextensiveattentioninrecentyearsduetotheirpotentialforsolvinglarge-scaleoptimizationproblemsarisingfrommachinelearningandotherapplications.Inthispaper,wedevelopanacceleratedproximalcoordinategradient(APCG)methodforsolvingconvexoptimizationproblemswiththefollowingform:minimizex2RF(x)def=f(x)+ (x) ;(1)wherefisdifferentiableondom( ),and hasablockseparablestructure.Morespecically, (x)=nX=1 (x);(2)whereeachxdenotesasub-vectorofxwithcardinalityN,andeach :RNR[f+1gisaclosedconvexfunction.Weassumethecollectionfx:i=1;:::;ngformapartitionofthecomponentsofx2RN.Inadditiontothecapabilityofmodelingnonsmoothregularizationtermssuchas (x)=kxk1,thismodelalsoincludesoptimizationproblemswithblockseparableconstraints.Moreprecisely,eachblockconstraintx2C,whereCisaclosedconvexset,canbemodeledbyanindicatorfunctiondenedas (x)=0ifx2Candotherwise.Ateachiteration,coordinatedescentmethodschooseoneblockofcoordinatesxtosufcientlyreducetheobjectivevaluewhilekeepingotherblocksxed.Onecommonandsimpleapproachforchoosingsuchablockisthecyclicscheme.Theglobalandlocalconvergencepropertiesofthecycliccoordinatedescentmethodhavebeenstudiedin,forexample,[21,11,16,2,5].Recently,randomizedstrategiesforchoosingtheblocktoupdatebecamemorepopular.Inadditiontoitsthe-oreticalbenets[13,14,19],numerousexperimentshavedemonstratedthatrandomizedcoordinatedescentmethodsareverypowerfulforsolvinglarge-scalemachinelearningproblems[3,6,18,19].Inspiredbythesuccessofacceleratedfullgradientmethods(e.g.,[12,1,22]),severalrecentworkappliedNesterov'saccelerationschemestospeeduprandomizedcoordinatedescentmethods.Inparticular,Nesterov[13]developedanacceleratedrandomizedcoordinategradientmethodformin-imizingunconstrainedsmoothconvexfunctions,whichcorrespondstothecaseof (x)0in(1).1 LuandXiao[10]gaveasharperconvergenceanalysisofNesterov'smethod,andLeeandSid-ford[8]developedextensionswithweightedrandomsamplingschemes.Morerecently,FercoqandRicht´arik[4]proposedanAPPROX(Accelerated,ParallelandPROXimal)coordinatedescentmethodforsolvingthemoregeneralproblem(1)andobtainedacceleratedsublinearconvergencerates,buttheirmethodcannotexploitthestrongconvexitytoobtainacceleratedlinearrates.Inthispaper,wedevelopageneralAPCGmethodthatachievesacceleratedlinearconvergencerateswhentheobjectivefunctionisstronglyconvex.Withoutthestrongconvexityassumption,ourmethodrecoverstheAPPROXmethod[4].Moreover,weshowhowtoapplytheAPCGmethodtosolvethedualoftheregularizedempiricalriskminimization(ERM)problem,anddeviseefcientimplementationsthatavoidfull-dimensionalvectoroperations.Forill-conditionedERMproblems,ourmethodobtainsfasterconvergenceratesthanthestate-of-the-artstochasticdualcoordinateas-cent(SDCA)method[19],andtheimprovediterationcomplexitymatchestheacceleratedSDCAmethod[20].Wepresentnumericalexperimentstoillustratetheadvantageofourmethod.1.1NotationsandassumptionsForanypartitionofx2RNintofx2RN:i=1;:::;ng,thereisanNNpermutationmatrixUpartitionedasU=[U1Un],whereU2RNN,suchthatx=nX=1Ux;andx=UTx;i=1;:::;n:Foranyx2RN,thepartialgradientoffwithrespecttoxisdenedasrf(x)=UTrf(x);i=1;:::;n:WeassociateeachsubspaceRN,fori=1;:::;n,withthestandardEuclideannorm,denotedbykk.Wemakethefollowingassumptionswhicharestandardintheliteratureoncoordinatedescentmethods(e.g.,[13,14]).Assumption1.Thegradientoffunctionfisblock-wiseLipschitzcontinuouswithconstantsL,i.e.,krf(x+Uh) rf(x)kLkhk;8h2RN;i=1;:::;n;x2RN:Forconvenience,wedenethefollowingnorminthewholespaceRN:kxkL=nX=1Lkxk21=2;8x2RN:(3)Assumption2.Thereexists0suchthatforally2RNandx2dom( ),f(y)f(x)+hrf(x);y xi+ 2ky xk2L:TheconvexityparameteroffwithrespecttothenormkkListhelargestsuchthattheaboveinequalityholds.EveryconvexfunctionsatisesAssumption2with=0.If0,thefunctionfiscalledstronglyconvex.WenotethatanimmediateconsequenceofAssumption1isf(x+Uh)f(x)+hrf(x);hi+L 2khk2;8h2RN;i=1;:::;n;x2RN:(4)ThistogetherwithAssumption2implies1.2TheAPCGmethodInthissectionwedescribethegeneralAPCGmethod,andseveralvariantsthataremoresuitableforimplementationunderdifferentassumptions.ThesealgorithmsextendNesterov'sacceleratedgradientmethods[12,Section2.2]tothecompositeandcoordinatedescentsetting.Werstexplainthenotationsusedinouralgorithms.Thealgorithmsproceediniterations,withkbeingtheiterationcounter.Lowercaselettersx,y,zrepresentvectorsinthefullspaceRN,andx(k),y(k)andz(k)aretheirvaluesatthekthiteration.Eachblockcoordinateisindicatedwithasubscript,forexample,x(k)representsthevalueoftheithblockofthevectorx(k).TheGreekletters,,\rarescalars,andk,kand\rkrepresenttheirvaluesatiterationk.2 Algorithm1:theAPCGmethodInput:x(0)2dom( )andconvexityparameter0.Initialize:setz(0)=x(0)andchoose0\r02[;1].Iterate:repeatfork=0;1;2;:::1.Computek2(0;1 n]fromtheequationn22k=(1 k)\rk+k;(5)andset\rk+1=(1 k)\rk+k;k=k \rk+1:(6)2.Computey(k)asy(k)=1 k\rk+\rk+1k\rkz(k)+\rk+1x(k):(7)3.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)=argminx2Rnnk 2\r\rx (1 k)z(k) ky(k)\r\r2L+hrkf(y(k));xki+ k(xk)o:4.Setx(k+1)=y(k)+nk(z(k+1) z(k))+ n(z(k) y(k)):(8) ThegeneralAPCGmethodisgivenasAlgorithm1.Ateachiterationk,itchoosesarandomcoordinateik2f1;:::;ngandgeneratesy(k),x(k+1)andz(k+1).Onecanobservethatx(k+1)andz(k+1)dependontherealizationoftherandomvariablek=fi0;i1;:::;ikg;whiley(k)isindependentofikandonlydependsonk 1.Tobetterunderstandthismethod,wemakethefollowingobservations.Forconvenience,wedene~z(k+1)=argminx2Rnnk 2\r\rx (1 k)z(k) ky(k)\r\r2L+hrf(y(k));x y(k)i+ (x)o;(9)whichisafull-dimensionalupdateversionofStep3.Onecanobservethatz(k+1)isupdatedasz(k+1)=(~z(k+1)ifi=ik;(1 k)z(k)+ky(k)ifi=ik:(10)Noticethatfrom(5),(6),(7)and(8)wehavex(k+1)=y(k)+nkz(k+1) (1 k)z(k) ky(k);whichtogetherwith(10)yieldsx(k+1)=8:y(k)+nkz(k+1) z(k)+ nz(k) y(k)ifi=ik;y(k)ifi=ik:(11)Thatis,inStep4,weonlyneedtoupdatetheblockcoordinatesx(k+1)kandsettheresttobey(k).WenowstateatheoremconcerningtheexpectedrateofconvergenceoftheAPCGmethod,whoseproofcanbefoundinthefullreport[9].Theorem1.SupposeAssumptions1and2hold.LetF?betheoptimalvalueofproblem(1),andfx(k)gbethesequencegeneratedbytheAPCGmethod.Then,foranyk0,thereholds:Ek 1[F(x(k))] F?min(1 p nk;2n 2n+kp \r02)F(x(0)) F?+\r0 2R20;whereR0def=minx?2X?kx(0) x?kL;(12)andX?isthesetofoptimalsolutionsofproblem(1).3 OurresultinTheorem1improvesupontheconvergenceratesoftheproximalcoordinategradientmethodsin[14,10],whichhaveconvergenceratesontheorderofOminn1 nk;n n+ko:Forn=1,ourresultmatchesexactlythatoftheacceleratedfullgradientmethodin[12,Section2.2].2.1TwospecialcasesHerewegivetwosimpliedversionsoftheAPCGmethod,forthespecialcasesof=0and0,respectively.Algorithm2showsthesimpliedversionfor=0,whichcanbeappliedtoproblemswithoutstrongconvexity,oriftheconvexityparameterisunknown. Algorithm2:APCGwith=0Input:x(0)2dom( ).Initialize:setz(0)=x(0)andchoose02(0;1 n].Iterate:repeatfork=0;1;2;:::1.Computey(k)=(1 k)x(k)+kz(k).2.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)k=argminx2RnnkLk 2\r\rx z(k)k\r\r2+hrkf(y(k));x y(k)ki+ k(x)o:andsetz(k+1)=z(k)foralli=ik.3.Setx(k+1)=y(k)+nk(z(k+1) z(k)):4.Computek+1=1 2 4k+42k 2k: AccordingtoTheorem1,Algorithm2hasanacceleratedsublinearconvergencerate,thatisEk 1[F(x(k))] F?2n 2n+kn02F(x(0)) F?+1 2R20:Withthechoiceof0=1=n,Algorithm2reducestotheAPPROXmethod[4]withsingleblockupdateateachiteration(i.e.,=1intheirAlgorithm1).Forthestronglyconvexcasewith0,wecaninitializeAlgorithm1withtheparameter\r0=,whichimplies\rk=andk=k=p =nforallk0.ThisresultsinAlgorithm3. Algorithm3:APCGwith\r0=0Input:x(0)2dom( )andconvexityparameter0.Initialize:setz(0)=x(0)andand=p n.Iterate:repeatfork=0;1;2;:::1.Computey(k)=x(k)+z(k) 1+.2.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)=argminx2Rnn 2\r\rx (1 )z(k) y(k)\r\r2L+hrkf(y(k));xk y(k)ki+ k(xk)o.3.Setx(k+1)=y(k)+n(z(k+1) z(k))+n2(z(k) y(k)). AsadirectcorollaryofTheorem1,Algorithm3enjoysanacceleratedlinearconvergencerate:Ek 1[F(x(k))] F?1 p nkF(x(0)) F?+ 2R20:Tothebestofourknowledge,thisisthersttimesuchanacceleratedrateisobtainedforsolvingthegeneralproblem(1)(withstrongconvexity)usingcoordinatedescenttypeofmethods.4 2.2EfcientimplementationTheAPCGmethodswepresentedsofarallneedtoperformfull-dimensionalvectoroperationsateachiteration.Forexample,y(k)isupdatedasaconvexcombinationofx(k)andz(k),andthiscanbeverycostlysinceingeneraltheycanbedensevectors.Moreover,forthestronglycon-vexcase(Algorithms1and3),allblocksofz(k+1)needtobeupdatedateachiteration,althoughonlytheik-thblockneedstocomputethepartialgradientandperformaproximalmapping.Thesefull-dimensionalvectorupdatescostO(N)operationsperiterationandmaycausetheoverallcom-putationalcostofAPCGtobeevenhigherthanthefullgradientmethods(seediscussionsin[13]).Inordertoavoidfull-dimensionalvectoroperations,LeeandSidford[8]proposedachangeofvariablesschemeforacceleratedcoordinateddescentmethodsforunconstrainedsmoothminimiza-tion.FercoqandRicht´arik[4]devisedasimilarschemeforefcientimplementationinthe=0caseforcompositeminimization.Hereweshowthatsuchaschemecanalsobedevelopedforthecaseof0inthecompositeoptimizationsetting.Forsimplicity,weonlypresentanequivalentimplementationofthesimpliedAPCGmethoddescribedinAlgorithm3. Algorithm4:EfcientimplementationofAPCGwith\r0=0Input:x(0)2dom( )andconvexityparameter0.Initialize:set=p nand=1 1+,andinitializeu(0)=0andv(0)=x(0).Iterate:repeatfork=0;1;2;:::1.Chooseik2f1;:::;nguniformlyatrandomandcompute(k)k=argmin2RknnLk 2kk2+hrkf(k+1u(k)+v(k));i+ k( k+1u(k)k+v(k)k+)o.2.Letu(k+1)=u(k)andv(k+1)=v(k),andupdateu(k+1)k=u(k)k 1 n 2k+1(k)k;v(k+1)k=v(k)k+1+n 2(k)k:(13)Output:x(k+1)=k+1u(k+1)+v(k+1) ThefollowingPropositionisprovedinthefullreport[9].Proposition1.TheiteratesofAlgorithm3andAlgorithm4satisfythefollowingrelationships:x(k)=ku(k)+v(k);y(k)=k+1u(k)+v(k);z(k)= ku(k)+v(k):(14)WenotethatinAlgorithm4,onlyasingleblockcoordinateofthevectorsu(k)andv(k)areupdatedateachiteration,whichcostO(N).However,computingthepartialgradientrkf(k+1u(k)+v(k))maystillcostO(N)ingeneral.Inthenextsection,weshowhowtofurtherexploitstructureinmanyERMproblemstocompletelyavoidfull-dimensionalvectoroperations.3Applicationtoregularizedempiricalriskminimization(ERM)LetA1;:::;AnbevectorsinRd,1,...,nbeasequenceofconvexfunctionsdenedonR,andgbeaconvexfunctiononRd.RegularizedERMaimstosolvethefollowingproblem:minimize2RP(w);withP(w)=1 nnX=1(ATw)+g(w);where0isaregularizationparameter.Forexample,givenalabelb2f1gforeachvectorA,fori=1;:::;n,weobtainthelinearSVMproblembysetting(z)=maxf0;1 bzgandg(w)=(1=2)kwk22.Regularizedlogisticregressionisobtainedbysetting(z)=log(1+exp( bz)).Thisformulationalsoincludesregressionproblems.Forexample,ridgeregressionisobtainedbysetting(1=2)(z)=(z b)2andg(w)=(1=2)kwk22,andwegetLassoifg(w)=kwk1.5 Letbetheconvexconjugateof,thatis,(u)=maxz2R(zu (z)).ThedualoftheregularizedERMproblemis(see,e.g.,[19])maximizex2RnD(x);withD(x)=1 nnX=1 ( x) g1 nAx;whereA=[A1;:::;An].ThisisequivalenttominimizeF(x)def= D(x),thatis,minimizex2RnF(x)def=1 nnX=1( x)+g1 nAx:ThestructureofF(x)abovematchestheformulationin(1)and(2)withf(x)=g1 nAxand (x)=1 n( x),andwecanapplytheAPCGmethodtominimizeF(x).Inordertoexploitthefastlinearconvergencerate,wemakethefollowingassumption.Assumption3.Eachfunctionis1=\rsmooth,andthefunctionghasunitconvexityparameter1.Hereweslightlyabusethenotationbyoverloading\r,whichalsoappearedinAlgorithm1.Butinthissectionitsolelyrepresentsthe(inverse)smoothnessparameterof.Assumption3impliesthateachhasstrongconvexityparameter\r(withrespecttothelocalEuclideannorm)andgisdifferentiableandrghasLipschitzconstant1.Inthefollowing,wesplitthefunctionF(x)=f(x)+ (x)byrelocatingthestrongconvexitytermasfollows:f(x)=g1 nAx+\r 2nkxk2; (x)=1 nnX=1( x) \r 2kxk2:(15)Asaresult,thefunctionfisstronglyconvexandeach isstillconvex.NowwecanapplytheAPCGmethodtominimizeF(x)= D(x),andobtainthefollowingguarantee.Theorem2.SupposeAssumption3holdsandkAkRforalli=1;:::;n.InordertoobtainanexpecteddualoptimalitygapE[D? D(x(k))]byusingtheAPCGmethod,itsufcestohavekn+ nR2 \rlog(C=):(16)whereD?=maxx2RnD(x)andtheconstantC=D? D(x(0))+(\r=(2n))kx(0) x?k2.Proof.Thefunctionf(x)in(15)hascoordinateLipschitzconstantsL=kAk2 n2+\r nR2+\rn n2andconvexityparameter\r nwithrespecttotheunweightedEuclideannorm.Thestrongconvexityparameteroff(x)withrespecttothenormkkLdenedin(3)is=\r n.R2+\rn n2=\rn R2+\rn:AccordingtoTheorem1,wehaveE[D? D(x(0))]1 p nkCexp p nkC.Thereforeitsufcestohavethenumberofiterationsktobelargerthann p log(C=)=n R2+\rn \rnlog(C=)= n2+nR2 \rlog(C=)n+ nR2 \rlog(C=):Thisnishestheproof. Severalstate-of-the-artalgorithmsforERM,includingSDCA[19],SAG[15,17]andSVRG[7,23]obtaintheiterationcomplexityOn+R2 \rlog(1=):(17)Wenotethatourresultin(16)canbemuchbetterforill-conditionedproblems,i.e.,whenthecondi-tionnumberR2 \rislargerthann.ThisisalsoconrmedbyournumericalexperimentsinSection4.Thecomplexityboundin(17)fortheaforementionedworkisforminimizingtheprimalobjectiveP(x)orthedualitygapP(x) D(x),butourresultinTheorem2isintermsofthedualoptimality.Inthefullreport[9],weshowthatthesameguaranteeonacceleratedprimal-dualconvergencecanbeobtainedbyourmethodwithanextraprimalgradientstep,withoutaffectingtheoverallcomplexity.TheexperimentsinSection4illustratesuperiorperformanceofouralgorithmonreducingtheprimalobjectivevalue,evenwithoutperformingtheextrastep.6 WenotethatShalev-ShwartzandZhang[20]recentlydevelopedanacceleratedSDCAmethodwhichachievesthesamecomplexityOn+ n \rlog(1=)asourmethod.TheirmethodcallstheSDCAmethodinafull-dimensionalacceleratedgradientmethodinaninner-outeriterationpro-cedure.Incontrast,ourAPCGmethodisastraightforwardsingleloopcoordinategradientmethod.3.1ImplementationdetailsHereweshowhowtoexploitthestructureoftheregularizedERMproblemtoefcientlycomputethecoordinategradientrkf(y(k)),andtotallyavoidfull-dimensionalupdatesinAlgorithm4.Wefocusonthespecialcaseg(w)=1 2kwk2andshowhowtocomputerkf(y(k)).Accordingto(15),rkf(y(k))=1 n2AT(Ay(k))+\r ny(k)k:Sincewedonotformy(k)inAlgorithm4,weupdateAy(k)bystoringandupdatingtwovectorsinRd:p(k)=Au(k)andq(k)=Av(k).TheresultingmethodisdetailedinAlgorithm5. Algorithm5:APCGforsolvingdualERMInput:x(0)2dom( )andconvexityparameter0.Initialize:set=p nand=1 1+,andletu(0)=0,v(0)=x(0),p(0)=0andq(0)=Ax(0).Iterate:repeatfork=0;1;2;:::1.Chooseik2f1;:::;nguniformlyatrandom,computethecoordinategradientr(k)k=1 n2k+1ATkp(k)+ATkq(k)+\r nk+1u(k)k+v(k)k:2.Computecoordinateincrement(k)k=argmin2Rkn(kAkk2+\rn) 2nkk2+hr(k)k;i+1 nk(k+1u(k)k v(k)k )o:3.Letu(k+1)=u(k)andv(k+1)=v(k),andupdateu(k+1)k=u(k)k 1 n 2k+1(k)k;v(k+1)k=v(k)k+1+n 2(k)k;p(k+1)=p(k) 1 n 2k+1Ak(k)k;q(k+1)=q(k)+1+n 2Ak(k)k:(18)Output:approximateprimalanddualsolutionsw(k+1)=1 nk+2p(k+1)+q(k+1);x(k+1)=k+1u(k+1)+v(k+1): EachiterationofAlgorithm5onlyinvolvesthetwoinnerproductsATkp(k),ATkq(k)incomputingr(k)kandthetwovectoradditionsin(18).TheyallcostO(d)ratherthanO(n).WhentheA'saresparse(thecaseofmostlarge-scaleproblems),theseoperationscanbecarriedoutveryefciently.Basically,eachiterationofAlgorithm5onlycoststwiceasmuchasthatofSDCA[6,19].4ExperimentsInourexperiments,wesolveERMproblemswithsmoothedhingelossforbinaryclassication.Thatis,wepre-multiplyeachfeaturevectorAbyitslabelb2f1gandusethelossfunction(a)=8:0ifa1;1 a \r 2ifa1 \r;1 2\r(1 a)2otherwise:Theconjugatefunctionofis(b)=b+\r 2b2ifb2[ 1;0]andotherwise.Thereforewehave (x)=1 n( x) \r 2kxk2= x nifx2[0;1]otherwise:ThedatasetusedinourexperimentsaresummarizedinTable1.7 rcv1covertypenews20 10 5 0204060801001015101210101010AFGSDCAAPCG 0204060801001015101210101010AFGSDCAAPCG 0204060801001015101210101010AFGSDCAAPCG 10 6 02040608010010101010 02040608010010101010 02040608010010101010 10 7 02040608010010101010101010 02040608010010101010101010 02040608010010101010101010 10 8 0204060801001010101010AFGSDCAAPCG 0204060801001010101010 020406080100101010101010 Figure1:ComparingtheAPCGmethodwithSDCAandtheacceleratedfullgradientmethod(AFG)withadaptivelinesearch.Ineachplot,theverticalaxisistheprimalobjectivegapP(w(k)) P?,andthehorizontalaxisisthenumberofpassesthroughtheentiredataset.Thethreecolumnscorrespondtothethreedatasets,andeachrowcorrespondstoaparticularvalueoftheregularizationparameter.Inourexperiments,wecomparetheAPCGmethodwithSDCAandtheacceleratedfullgradientmethod(AFG)[12]withanadditionallinesearchproceduretoimproveefciency.Whentheregu-larizationparameterisnottoosmall(around10 4),thenAPCGperformssimilarlyasSDCAaspredictedbyourcomplexityresults,andtheybothoutperformAFGbyasubstantialmargin.Figure1showstheresultsintheill-conditionedsetting,withvaryingform10 5to10 8.HereweseethatAPCGhassuperiorperformanceinreducingtheprimalobjectivevaluecomparedwithSDCAandAFG,eventhoughourtheoryonlygivescomplexityforsolvingthedualERMproblem.AFGeventuallycatchesupforcaseswithverylargeconditionnumber(seetheplotsfor=10 8). datasets numberofsamplesn numberoffeaturesd sparsity rcv1 20,242 47,236 0.16% covtype 581,012 54 22% news20 19,996 1,355,191 0.04% Table1:Characteristicsofthreebinaryclassicationdatasets(availablefromtheLIBSVMwebpage:http://www.csie.ntu.edu.tw/cjlin/libsvmtools/datasets).8 References[1]A.BeckandM.Teboulle.Afastiterativeshrinkage-thresholdalgorithmforlinearinverseproblems.SIAMJournalonImagingSciences,2(1):183202,2009.[2]A.BeckandL.Tetruashvili.Ontheconvergenceofblockcoordinatedescenttypemethods.SIAMJournalonOptimization,13(4):20372060,2013.[3]K.-W.Chang,C.-J.Hsieh,andC.-J.Lin.Coordinatedescentmethodforlarge-scalel2-losslinearsupportvectormachines.JournalofMachineLearningResearch,9:13691398,2008.[4]O.FercoqandP.Richt´arik.Accelerated,parallelandproximalcoordinatedescent.Manuscript,arXiv:1312.5799,2013.[5]M.Hong,X.Wang,M.Razaviyayn,andZ.Q.Luo.Iterationcomplexityanalysisofblockcoordinatedescentmethods.Manuscript,arXiv:1310.6957,2013.[6]C.-J.Hsieh,K.-W.Chang,C.-J.Lin,S.-S.Keerthi,andS.Sundararajan.Adualcoordinatedescentmethodforlarge-scalelinearsvm.InProceedingsofthe25thInternationalConferenceonMachineLearning(ICML),pages408415,2008.[7]R.JohnsonandT.Zhang.Acceleratingstochasticgradientdescentusingpredictivevariancereduction.InAdvancesinNeuralInformationProcessingSystems26,pages315323.2013.[8]Y.T.LeeandA.Sidford.Efcientacceleratedcoordinatedescentmethodsandfasteralgo-rithmsforsolvinglinearsystems.arXiv:1305.1922.[9]Q.Lin,Z.Lu,andL.Xiao.Anacceleratedproximalcoordinategradientmethodanditsapplicationtoregularizedempiricalriskminimization.TechnicalReportMSR-TR-2014-94,MicrosoftResearch,2014.(arXiv:1407.1296).[10]Z.LuandL.Xiao.Onthecomplexityanalysisofrandomizedblock-coordinatedescentmeth-ods.AcceptedbyMathematicalProgramming,SeriesA,2014.(arXiv:1305.4723).[11]Z.Q.LuoandP.Tseng.Ontheconvergenceofthecoordinatedescentmethodforconvexdif-ferentiableminimization.JournalofOptimizationTheory&Applications,72(1):735,2002.[12]Y.Nesterov.IntroductoryLecturesonConvexOptimization:ABasicCourse.Kluwer,Boston,2004.[13]Y.Nesterov.Efciencyofcoordinatedescentmethodsonhuge-scaleoptimizationproblems.SIAMJournalonOptimization,22(2):341362,2012.[14]P.Richt´arikandM.Tak´ac.Iterationcomplexityofrandomizedblock-coordinatedescentmeth-odsforminimizingacompositefunction.MathematicalProgramming,144(1):138,2014.[15]N.LeRoux,M.Schmidt,andF.Bach.Astochasticgradientmethodwithanexponentialconvergenceratefornitetrainingsets.InAdvancesinNeuralInformationProcessingSystems25,pages26722680.2012.[16]A.SahaandA.Tewari.Onthenon-asymptoticconvergenceofcycliccoordinatedescentmethods.SIAMJorunalonOptimization,23:576601,2013.[17]M.Schmidt,N.LeRoux,andF.Bach.Minimizingnitesumswiththestochasticaveragegradient.TechnicalReportHAL00860051,INRIA,Paris,France,2013.[18]S.Shalev-ShwartzandA.Tewari.Stochasticmethodsfor`1regularizedlossminimization.InProceedingsofthe26thInternationalConferenceonMachineLearning(ICML),pages929936,Montreal,Canada,2009.[19]S.Shalev-ShwartzandT.Zhang.Stochasticdualcoordinateascentmethodsforregularizedlossminimization.JournalofMachineLearningResearch,14:567599,2013.[20]S.Shalev-ShwartzandT.Zhang.Acceleratedproximalstochasticdualcoordinateascentforregularizedlossminimization.Proceedingsofthe31stInternationalConferenceonMachineLearning(ICML),JMLRW&CP,32(1):6472,2014.[21]P.Tseng.Convergenceofablockcoordinatedescentmethodfornondifferentiableminimiza-tion.JournalofOptimizationTheoryandApplications,140:513535,2001.[22]P.Tseng.Onacceleratedproximalgradientmethodsforconvex-concaveoptimization.Un-publishedmanuscript,2008.[23]L.XiaoandT.Zhang.Aproximalstochasticgradientmethodwithprogressivevariancere-duction.TechnicalReportMSR-TR-2014-38,MicrosoftResearch,2014.(arXiv:1403.4699).9