/
AnAcceleratedProximalCoordinateGradientMethod AnAcceleratedProximalCoordinateGradientMethod

AnAcceleratedProximalCoordinateGradientMethod - PDF document

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
373 views
Uploaded On 2016-11-07

AnAcceleratedProximalCoordinateGradientMethod - PPT Presentation

QihangLinUniversityofIowaIowaCityIAUSAqihanglinuiowaeduZhaosongLuSimonFraserUniversityBurnabyBCCanadazhaosongsfucaLinXiaoMicrosoftResearchRedmondWAUSAlinxiaomicrosoftcomAbstractWedevelop ID: 485747

QihangLinUniversityofIowaIowaCity USAqihang-lin@uiowa.eduZhaosongLuSimonFraserUniversityBurnaby Canadazhaosong@sfu.caLinXiaoMicrosoftResearchRedmond USAlin.xiao@microsoft.comAbstractWedevelop

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "AnAcceleratedProximalCoordinateGradientM..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

AnAcceleratedProximalCoordinateGradientMethod QihangLinUniversityofIowaIowaCity,IA,USAqihang-lin@uiowa.eduZhaosongLuSimonFraserUniversityBurnaby,BC,Canadazhaosong@sfu.caLinXiaoMicrosoftResearchRedmond,WA,USAlin.xiao@microsoft.comAbstractWedevelopanacceleratedrandomizedproximalcoordinategradient(APCG)method,forsolvingabroadclassofcompositeconvexoptimizationproblems.Inparticular,ourmethodachievesfasterlinearconvergenceratesforminimizingstronglyconvexfunctionsthanexistingrandomizedproximalcoordinategradi-entmethods.WeshowhowtoapplytheAPCGmethodtosolvethedualoftheregularizedempiricalriskminimization(ERM)problem,anddeviseefcientim-plementationsthatavoidfull-dimensionalvectoroperations.Forill-conditionedERMproblems,ourmethodobtainsimprovedconvergenceratesthanthestate-of-the-artstochasticdualcoordinateascent(SDCA)method.1IntroductionCoordinatedescentmethodshavereceivedextensiveattentioninrecentyearsduetotheirpotentialforsolvinglarge-scaleoptimizationproblemsarisingfrommachinelearningandotherapplications.Inthispaper,wedevelopanacceleratedproximalcoordinategradient(APCG)methodforsolvingconvexoptimizationproblemswiththefollowingform:minimizex2RF(x)def=f(x)+ (x) ;(1)wherefisdifferentiableondom( ),and hasablockseparablestructure.Morespecically, (x)=nX=1 (x);(2)whereeachxdenotesasub-vectorofxwithcardinalityN,andeach :RNR[f+1gisaclosedconvexfunction.Weassumethecollectionfx:i=1;:::;ngformapartitionofthecomponentsofx2RN.Inadditiontothecapabilityofmodelingnonsmoothregularizationtermssuchas (x)=kxk1,thismodelalsoincludesoptimizationproblemswithblockseparableconstraints.Moreprecisely,eachblockconstraintx2C,whereCisaclosedconvexset,canbemodeledbyanindicatorfunctiondenedas (x)=0ifx2Candotherwise.Ateachiteration,coordinatedescentmethodschooseoneblockofcoordinatesxtosufcientlyreducetheobjectivevaluewhilekeepingotherblocksxed.Onecommonandsimpleapproachforchoosingsuchablockisthecyclicscheme.Theglobalandlocalconvergencepropertiesofthecycliccoordinatedescentmethodhavebeenstudiedin,forexample,[21,11,16,2,5].Recently,randomizedstrategiesforchoosingtheblocktoupdatebecamemorepopular.Inadditiontoitsthe-oreticalbenets[13,14,19],numerousexperimentshavedemonstratedthatrandomizedcoordinatedescentmethodsareverypowerfulforsolvinglarge-scalemachinelearningproblems[3,6,18,19].Inspiredbythesuccessofacceleratedfullgradientmethods(e.g.,[12,1,22]),severalrecentworkappliedNesterov'saccelerationschemestospeeduprandomizedcoordinatedescentmethods.Inparticular,Nesterov[13]developedanacceleratedrandomizedcoordinategradientmethodformin-imizingunconstrainedsmoothconvexfunctions,whichcorrespondstothecaseof (x)0in(1).1 LuandXiao[10]gaveasharperconvergenceanalysisofNesterov'smethod,andLeeandSid-ford[8]developedextensionswithweightedrandomsamplingschemes.Morerecently,FercoqandRicht´arik[4]proposedanAPPROX(Accelerated,ParallelandPROXimal)coordinatedescentmethodforsolvingthemoregeneralproblem(1)andobtainedacceleratedsublinearconvergencerates,buttheirmethodcannotexploitthestrongconvexitytoobtainacceleratedlinearrates.Inthispaper,wedevelopageneralAPCGmethodthatachievesacceleratedlinearconvergencerateswhentheobjectivefunctionisstronglyconvex.Withoutthestrongconvexityassumption,ourmethodrecoverstheAPPROXmethod[4].Moreover,weshowhowtoapplytheAPCGmethodtosolvethedualoftheregularizedempiricalriskminimization(ERM)problem,anddeviseefcientimplementationsthatavoidfull-dimensionalvectoroperations.Forill-conditionedERMproblems,ourmethodobtainsfasterconvergenceratesthanthestate-of-the-artstochasticdualcoordinateas-cent(SDCA)method[19],andtheimprovediterationcomplexitymatchestheacceleratedSDCAmethod[20].Wepresentnumericalexperimentstoillustratetheadvantageofourmethod.1.1NotationsandassumptionsForanypartitionofx2RNintofx2RN:i=1;:::;ng,thereisanNNpermutationmatrixUpartitionedasU=[U1Un],whereU2RNN,suchthatx=nX=1Ux;andx=UTx;i=1;:::;n:Foranyx2RN,thepartialgradientoffwithrespecttoxisdenedasrf(x)=UTrf(x);i=1;:::;n:WeassociateeachsubspaceRN,fori=1;:::;n,withthestandardEuclideannorm,denotedbykk.Wemakethefollowingassumptionswhicharestandardintheliteratureoncoordinatedescentmethods(e.g.,[13,14]).Assumption1.Thegradientoffunctionfisblock-wiseLipschitzcontinuouswithconstantsL,i.e.,krf(x+Uh)rf(x)kLkhk;8h2RN;i=1;:::;n;x2RN:Forconvenience,wedenethefollowingnorminthewholespaceRN:kxkL=nX=1Lkxk21=2;8x2RN:(3)Assumption2.Thereexists0suchthatforally2RNandx2dom( ),f(y)f(x)+hrf(x);yxi+ 2kyxk2L:TheconvexityparameteroffwithrespecttothenormkkListhelargestsuchthattheaboveinequalityholds.EveryconvexfunctionsatisesAssumption2with=0.If�0,thefunctionfiscalledstronglyconvex.WenotethatanimmediateconsequenceofAssumption1isf(x+Uh)f(x)+hrf(x);hi+L 2khk2;8h2RN;i=1;:::;n;x2RN:(4)ThistogetherwithAssumption2implies1.2TheAPCGmethodInthissectionwedescribethegeneralAPCGmethod,andseveralvariantsthataremoresuitableforimplementationunderdifferentassumptions.ThesealgorithmsextendNesterov'sacceleratedgradientmethods[12,Section2.2]tothecompositeandcoordinatedescentsetting.Werstexplainthenotationsusedinouralgorithms.Thealgorithmsproceediniterations,withkbeingtheiterationcounter.Lowercaselettersx,y,zrepresentvectorsinthefullspaceRN,andx(k),y(k)andz(k)aretheirvaluesatthekthiteration.Eachblockcoordinateisindicatedwithasubscript,forexample,x(k)representsthevalueoftheithblockofthevectorx(k).TheGreekletters , ,\rarescalars,and k, kand\rkrepresenttheirvaluesatiterationk.2 Algorithm1:theAPCGmethodInput:x(0)2dom( )andconvexityparameter0.Initialize:setz(0)=x(0)andchoose0\r02[;1].Iterate:repeatfork=0;1;2;:::1.Compute k2(0;1 n]fromtheequationn2 2k=(1 k)\rk+ k;(5)andset\rk+1=(1 k)\rk+ k; k= k \rk+1:(6)2.Computey(k)asy(k)=1 k\rk+\rk+1 k\rkz(k)+\rk+1x(k):(7)3.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)=argminx2Rnn k 2\r\rx(1 k)z(k) ky(k)\r\r2L+hrkf(y(k));xki+ k(xk)o:4.Setx(k+1)=y(k)+n k(z(k+1)z(k))+ n(z(k)y(k)):(8) ThegeneralAPCGmethodisgivenasAlgorithm1.Ateachiterationk,itchoosesarandomcoordinateik2f1;:::;ngandgeneratesy(k),x(k+1)andz(k+1).Onecanobservethatx(k+1)andz(k+1)dependontherealizationoftherandomvariablek=fi0;i1;:::;ikg;whiley(k)isindependentofikandonlydependsonk1.Tobetterunderstandthismethod,wemakethefollowingobservations.Forconvenience,wedene~z(k+1)=argminx2Rnn k 2\r\rx(1 k)z(k) ky(k)\r\r2L+hrf(y(k));xy(k)i+ (x)o;(9)whichisafull-dimensionalupdateversionofStep3.Onecanobservethatz(k+1)isupdatedasz(k+1)=(~z(k+1)ifi=ik;(1 k)z(k)+ ky(k)ifi=ik:(10)Noticethatfrom(5),(6),(7)and(8)wehavex(k+1)=y(k)+n kz(k+1)(1 k)z(k) ky(k);whichtogetherwith(10)yieldsx(k+1)=8:y(k)+n kz(k+1)z(k)+ nz(k)y(k)ifi=ik;y(k)ifi=ik:(11)Thatis,inStep4,weonlyneedtoupdatetheblockcoordinatesx(k+1)kandsettheresttobey(k).WenowstateatheoremconcerningtheexpectedrateofconvergenceoftheAPCGmethod,whoseproofcanbefoundinthefullreport[9].Theorem1.SupposeAssumptions1and2hold.LetF?betheoptimalvalueofproblem(1),andfx(k)gbethesequencegeneratedbytheAPCGmethod.Then,foranyk0,thereholds:Ek1[F(x(k))]F?min(1p  nk;2n 2n+kp \r02)F(x(0))F?+\r0 2R20;whereR0def=minx?2X?kx(0)x?kL;(12)andX?isthesetofoptimalsolutionsofproblem(1).3 OurresultinTheorem1improvesupontheconvergenceratesoftheproximalcoordinategradientmethodsin[14,10],whichhaveconvergenceratesontheorderofOminn1 nk;n n+ko:Forn=1,ourresultmatchesexactlythatoftheacceleratedfullgradientmethodin[12,Section2.2].2.1TwospecialcasesHerewegivetwosimpliedversionsoftheAPCGmethod,forthespecialcasesof=0and�0,respectively.Algorithm2showsthesimpliedversionfor=0,whichcanbeappliedtoproblemswithoutstrongconvexity,oriftheconvexityparameterisunknown. Algorithm2:APCGwith=0Input:x(0)2dom( ).Initialize:setz(0)=x(0)andchoose 02(0;1 n].Iterate:repeatfork=0;1;2;:::1.Computey(k)=(1 k)x(k)+ kz(k).2.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)k=argminx2Rnn kLk 2\r\rxz(k)k\r\r2+hrkf(y(k));xy(k)ki+ k(x)o:andsetz(k+1)=z(k)foralli=ik.3.Setx(k+1)=y(k)+n k(z(k+1)z(k)):4.Compute k+1=1 2 4k+4 2k 2k: AccordingtoTheorem1,Algorithm2hasanacceleratedsublinearconvergencerate,thatisEk1[F(x(k))]F?2n 2n+kn 02F(x(0))F?+1 2R20:Withthechoiceof 0=1=n,Algorithm2reducestotheAPPROXmethod[4]withsingleblockupdateateachiteration(i.e.,=1intheirAlgorithm1).Forthestronglyconvexcasewith�0,wecaninitializeAlgorithm1withtheparameter\r0=,whichimplies\rk=and k= k=p =nforallk0.ThisresultsinAlgorithm3. Algorithm3:APCGwith\r0=�0Input:x(0)2dom( )andconvexityparameter�0.Initialize:setz(0)=x(0)andand =p  n.Iterate:repeatfork=0;1;2;:::1.Computey(k)=x(k)+ z(k) 1+.2.Chooseik2f1;:::;nguniformlyatrandomandcomputez(k+1)=argminx2Rnn 2\r\rx(1 )z(k) y(k)\r\r2L+hrkf(y(k));xky(k)ki+ k(xk)o.3.Setx(k+1)=y(k)+n (z(k+1)z(k))+n 2(z(k)y(k)). AsadirectcorollaryofTheorem1,Algorithm3enjoysanacceleratedlinearconvergencerate:Ek1[F(x(k))]F?1p  nkF(x(0))F?+ 2R20:Tothebestofourknowledge,thisisthersttimesuchanacceleratedrateisobtainedforsolvingthegeneralproblem(1)(withstrongconvexity)usingcoordinatedescenttypeofmethods.4 2.2EfcientimplementationTheAPCGmethodswepresentedsofarallneedtoperformfull-dimensionalvectoroperationsateachiteration.Forexample,y(k)isupdatedasaconvexcombinationofx(k)andz(k),andthiscanbeverycostlysinceingeneraltheycanbedensevectors.Moreover,forthestronglycon-vexcase(Algorithms1and3),allblocksofz(k+1)needtobeupdatedateachiteration,althoughonlytheik-thblockneedstocomputethepartialgradientandperformaproximalmapping.Thesefull-dimensionalvectorupdatescostO(N)operationsperiterationandmaycausetheoverallcom-putationalcostofAPCGtobeevenhigherthanthefullgradientmethods(seediscussionsin[13]).Inordertoavoidfull-dimensionalvectoroperations,LeeandSidford[8]proposedachangeofvariablesschemeforacceleratedcoordinateddescentmethodsforunconstrainedsmoothminimiza-tion.FercoqandRicht´arik[4]devisedasimilarschemeforefcientimplementationinthe=0caseforcompositeminimization.Hereweshowthatsuchaschemecanalsobedevelopedforthecaseof�0inthecompositeoptimizationsetting.Forsimplicity,weonlypresentanequivalentimplementationofthesimpliedAPCGmethoddescribedinAlgorithm3. Algorithm4:EfcientimplementationofAPCGwith\r0=�0Input:x(0)2dom( )andconvexityparameter�0.Initialize:set =p  nand=1 1+,andinitializeu(0)=0andv(0)=x(0).Iterate:repeatfork=0;1;2;:::1.Chooseik2f1;:::;nguniformlyatrandomandcompute(k)k=argmin2Rknn Lk 2kk2+hrkf(k+1u(k)+v(k));i+ k(k+1u(k)k+v(k)k+)o.2.Letu(k+1)=u(k)andv(k+1)=v(k),andupdateu(k+1)k=u(k)k1n 2k+1(k)k;v(k+1)k=v(k)k+1+n 2(k)k:(13)Output:x(k+1)=k+1u(k+1)+v(k+1) ThefollowingPropositionisprovedinthefullreport[9].Proposition1.TheiteratesofAlgorithm3andAlgorithm4satisfythefollowingrelationships:x(k)=ku(k)+v(k);y(k)=k+1u(k)+v(k);z(k)=ku(k)+v(k):(14)WenotethatinAlgorithm4,onlyasingleblockcoordinateofthevectorsu(k)andv(k)areupdatedateachiteration,whichcostO(N).However,computingthepartialgradientrkf(k+1u(k)+v(k))maystillcostO(N)ingeneral.Inthenextsection,weshowhowtofurtherexploitstructureinmanyERMproblemstocompletelyavoidfull-dimensionalvectoroperations.3Applicationtoregularizedempiricalriskminimization(ERM)LetA1;:::;AnbevectorsinRd,1,...,nbeasequenceofconvexfunctionsdenedonR,andgbeaconvexfunctiononRd.RegularizedERMaimstosolvethefollowingproblem:minimize2RP(w);withP(w)=1 nnX=1(ATw)+g(w);where�0isaregularizationparameter.Forexample,givenalabelb2f1gforeachvectorA,fori=1;:::;n,weobtainthelinearSVMproblembysetting(z)=maxf0;1bzgandg(w)=(1=2)kwk22.Regularizedlogisticregressionisobtainedbysetting(z)=log(1+exp(bz)).Thisformulationalsoincludesregressionproblems.Forexample,ridgeregressionisobtainedbysetting(1=2)(z)=(zb)2andg(w)=(1=2)kwk22,andwegetLassoifg(w)=kwk1.5 Letbetheconvexconjugateof,thatis,(u)=maxz2R(zu(z)).ThedualoftheregularizedERMproblemis(see,e.g.,[19])maximizex2RnD(x);withD(x)=1 nnX=1(x)g1 nAx;whereA=[A1;:::;An].ThisisequivalenttominimizeF(x)def=D(x),thatis,minimizex2RnF(x)def=1 nnX=1(x)+g1 nAx:ThestructureofF(x)abovematchestheformulationin(1)and(2)withf(x)=g1 nAxand (x)=1 n(x),andwecanapplytheAPCGmethodtominimizeF(x).Inordertoexploitthefastlinearconvergencerate,wemakethefollowingassumption.Assumption3.Eachfunctionis1=\rsmooth,andthefunctionghasunitconvexityparameter1.Hereweslightlyabusethenotationbyoverloading\r,whichalsoappearedinAlgorithm1.Butinthissectionitsolelyrepresentsthe(inverse)smoothnessparameterof.Assumption3impliesthateachhasstrongconvexityparameter\r(withrespecttothelocalEuclideannorm)andgisdifferentiableandrghasLipschitzconstant1.Inthefollowing,wesplitthefunctionF(x)=f(x)+ (x)byrelocatingthestrongconvexitytermasfollows:f(x)=g1 nAx+\r 2nkxk2; (x)=1 nnX=1(x)\r 2kxk2:(15)Asaresult,thefunctionfisstronglyconvexandeach isstillconvex.NowwecanapplytheAPCGmethodtominimizeF(x)=D(x),andobtainthefollowingguarantee.Theorem2.SupposeAssumption3holdsandkAkRforalli=1;:::;n.InordertoobtainanexpecteddualoptimalitygapE[D?D(x(k))]byusingtheAPCGmethod,itsufcestohavekn+ nR2 \rlog(C=):(16)whereD?=maxx2RnD(x)andtheconstantC=D?D(x(0))+(\r=(2n))kx(0)x?k2.Proof.Thefunctionf(x)in(15)hascoordinateLipschitzconstantsL=kAk2 n2+\r nR2+\rn n2andconvexityparameter\r nwithrespecttotheunweightedEuclideannorm.Thestrongconvexityparameteroff(x)withrespecttothenormkkLdenedin(3)is=\r n.R2+\rn n2=\rn R2+\rn:AccordingtoTheorem1,wehaveE[D?D(x(0))]1p  nkCexpp  nkC.Thereforeitsufcestohavethenumberofiterationsktobelargerthann p log(C=)=n R2+\rn \rnlog(C=)= n2+nR2 \rlog(C=)n+ nR2 \rlog(C=):Thisnishestheproof. Severalstate-of-the-artalgorithmsforERM,includingSDCA[19],SAG[15,17]andSVRG[7,23]obtaintheiterationcomplexityOn+R2 \rlog(1=):(17)Wenotethatourresultin(16)canbemuchbetterforill-conditionedproblems,i.e.,whenthecondi-tionnumberR2 \rislargerthann.ThisisalsoconrmedbyournumericalexperimentsinSection4.Thecomplexityboundin(17)fortheaforementionedworkisforminimizingtheprimalobjectiveP(x)orthedualitygapP(x)D(x),butourresultinTheorem2isintermsofthedualoptimality.Inthefullreport[9],weshowthatthesameguaranteeonacceleratedprimal-dualconvergencecanbeobtainedbyourmethodwithanextraprimalgradientstep,withoutaffectingtheoverallcomplexity.TheexperimentsinSection4illustratesuperiorperformanceofouralgorithmonreducingtheprimalobjectivevalue,evenwithoutperformingtheextrastep.6 WenotethatShalev-ShwartzandZhang[20]recentlydevelopedanacceleratedSDCAmethodwhichachievesthesamecomplexityOn+ n \rlog(1=)asourmethod.TheirmethodcallstheSDCAmethodinafull-dimensionalacceleratedgradientmethodinaninner-outeriterationpro-cedure.Incontrast,ourAPCGmethodisastraightforwardsingleloopcoordinategradientmethod.3.1ImplementationdetailsHereweshowhowtoexploitthestructureoftheregularizedERMproblemtoefcientlycomputethecoordinategradientrkf(y(k)),andtotallyavoidfull-dimensionalupdatesinAlgorithm4.Wefocusonthespecialcaseg(w)=1 2kwk2andshowhowtocomputerkf(y(k)).Accordingto(15),rkf(y(k))=1 n2AT(Ay(k))+\r ny(k)k:Sincewedonotformy(k)inAlgorithm4,weupdateAy(k)bystoringandupdatingtwovectorsinRd:p(k)=Au(k)andq(k)=Av(k).TheresultingmethodisdetailedinAlgorithm5. Algorithm5:APCGforsolvingdualERMInput:x(0)2dom( )andconvexityparameter�0.Initialize:set =p  nand=1 1+,andletu(0)=0,v(0)=x(0),p(0)=0andq(0)=Ax(0).Iterate:repeatfork=0;1;2;:::1.Chooseik2f1;:::;nguniformlyatrandom,computethecoordinategradientr(k)k=1 n2k+1ATkp(k)+ATkq(k)+\r nk+1u(k)k+v(k)k:2.Computecoordinateincrement(k)k=argmin2Rkn(kAkk2+\rn) 2nkk2+hr(k)k;i+1 nk(k+1u(k)kv(k)k)o:3.Letu(k+1)=u(k)andv(k+1)=v(k),andupdateu(k+1)k=u(k)k1n 2k+1(k)k;v(k+1)k=v(k)k+1+n 2(k)k;p(k+1)=p(k)1n 2k+1Ak(k)k;q(k+1)=q(k)+1+n 2Ak(k)k:(18)Output:approximateprimalanddualsolutionsw(k+1)=1 nk+2p(k+1)+q(k+1);x(k+1)=k+1u(k+1)+v(k+1): EachiterationofAlgorithm5onlyinvolvesthetwoinnerproductsATkp(k),ATkq(k)incomputingr(k)kandthetwovectoradditionsin(18).TheyallcostO(d)ratherthanO(n).WhentheA'saresparse(thecaseofmostlarge-scaleproblems),theseoperationscanbecarriedoutveryefciently.Basically,eachiterationofAlgorithm5onlycoststwiceasmuchasthatofSDCA[6,19].4ExperimentsInourexperiments,wesolveERMproblemswithsmoothedhingelossforbinaryclassication.Thatis,wepre-multiplyeachfeaturevectorAbyitslabelb2f1gandusethelossfunction(a)=8:0ifa1;1a\r 2ifa1\r;1 2\r(1a)2otherwise:Theconjugatefunctionofis(b)=b+\r 2b2ifb2[1;0]andotherwise.Thereforewehave (x)=1 n(x)\r 2kxk2=x nifx2[0;1]otherwise:ThedatasetusedinourexperimentsaresummarizedinTable1.7  rcv1covertypenews20 105 0204060801001015101210101010AFGSDCAAPCG 0204060801001015101210101010AFGSDCAAPCG 0204060801001015101210101010AFGSDCAAPCG 106 02040608010010101010 02040608010010101010 02040608010010101010 107 02040608010010101010101010 02040608010010101010101010 02040608010010101010101010 108 0204060801001010101010AFGSDCAAPCG 0204060801001010101010 020406080100101010101010 Figure1:ComparingtheAPCGmethodwithSDCAandtheacceleratedfullgradientmethod(AFG)withadaptivelinesearch.Ineachplot,theverticalaxisistheprimalobjectivegapP(w(k))P?,andthehorizontalaxisisthenumberofpassesthroughtheentiredataset.Thethreecolumnscorrespondtothethreedatasets,andeachrowcorrespondstoaparticularvalueoftheregularizationparameter.Inourexperiments,wecomparetheAPCGmethodwithSDCAandtheacceleratedfullgradientmethod(AFG)[12]withanadditionallinesearchproceduretoimproveefciency.Whentheregu-larizationparameterisnottoosmall(around104),thenAPCGperformssimilarlyasSDCAaspredictedbyourcomplexityresults,andtheybothoutperformAFGbyasubstantialmargin.Figure1showstheresultsintheill-conditionedsetting,withvaryingform105to108.HereweseethatAPCGhassuperiorperformanceinreducingtheprimalobjectivevaluecomparedwithSDCAandAFG,eventhoughourtheoryonlygivescomplexityforsolvingthedualERMproblem.AFGeventuallycatchesupforcaseswithverylargeconditionnumber(seetheplotsfor=108). datasets numberofsamplesn numberoffeaturesd sparsity rcv1 20,242 47,236 0.16% covtype 581,012 54 22% news20 19,996 1,355,191 0.04% Table1:Characteristicsofthreebinaryclassicationdatasets(availablefromtheLIBSVMwebpage:http://www.csie.ntu.edu.tw/˜cjlin/libsvmtools/datasets).8 References[1]A.BeckandM.Teboulle.Afastiterativeshrinkage-thresholdalgorithmforlinearinverseproblems.SIAMJournalonImagingSciences,2(1):183–202,2009.[2]A.BeckandL.Tetruashvili.Ontheconvergenceofblockcoordinatedescenttypemethods.SIAMJournalonOptimization,13(4):2037–2060,2013.[3]K.-W.Chang,C.-J.Hsieh,andC.-J.Lin.Coordinatedescentmethodforlarge-scalel2-losslinearsupportvectormachines.JournalofMachineLearningResearch,9:1369–1398,2008.[4]O.FercoqandP.Richt´arik.Accelerated,parallelandproximalcoordinatedescent.Manuscript,arXiv:1312.5799,2013.[5]M.Hong,X.Wang,M.Razaviyayn,andZ.Q.Luo.Iterationcomplexityanalysisofblockcoordinatedescentmethods.Manuscript,arXiv:1310.6957,2013.[6]C.-J.Hsieh,K.-W.Chang,C.-J.Lin,S.-S.Keerthi,andS.Sundararajan.Adualcoordinatedescentmethodforlarge-scalelinearsvm.InProceedingsofthe25thInternationalConferenceonMachineLearning(ICML),pages408–415,2008.[7]R.JohnsonandT.Zhang.Acceleratingstochasticgradientdescentusingpredictivevariancereduction.InAdvancesinNeuralInformationProcessingSystems26,pages315–323.2013.[8]Y.T.LeeandA.Sidford.Efcientacceleratedcoordinatedescentmethodsandfasteralgo-rithmsforsolvinglinearsystems.arXiv:1305.1922.[9]Q.Lin,Z.Lu,andL.Xiao.Anacceleratedproximalcoordinategradientmethodanditsapplicationtoregularizedempiricalriskminimization.TechnicalReportMSR-TR-2014-94,MicrosoftResearch,2014.(arXiv:1407.1296).[10]Z.LuandL.Xiao.Onthecomplexityanalysisofrandomizedblock-coordinatedescentmeth-ods.AcceptedbyMathematicalProgramming,SeriesA,2014.(arXiv:1305.4723).[11]Z.Q.LuoandP.Tseng.Ontheconvergenceofthecoordinatedescentmethodforconvexdif-ferentiableminimization.JournalofOptimizationTheory&Applications,72(1):7–35,2002.[12]Y.Nesterov.IntroductoryLecturesonConvexOptimization:ABasicCourse.Kluwer,Boston,2004.[13]Y.Nesterov.Efciencyofcoordinatedescentmethodsonhuge-scaleoptimizationproblems.SIAMJournalonOptimization,22(2):341–362,2012.[14]P.Richt´arikandM.Tak´ac.Iterationcomplexityofrandomizedblock-coordinatedescentmeth-odsforminimizingacompositefunction.MathematicalProgramming,144(1):1–38,2014.[15]N.LeRoux,M.Schmidt,andF.Bach.Astochasticgradientmethodwithanexponentialconvergenceratefornitetrainingsets.InAdvancesinNeuralInformationProcessingSystems25,pages2672–2680.2012.[16]A.SahaandA.Tewari.Onthenon-asymptoticconvergenceofcycliccoordinatedescentmethods.SIAMJorunalonOptimization,23:576–601,2013.[17]M.Schmidt,N.LeRoux,andF.Bach.Minimizingnitesumswiththestochasticaveragegradient.TechnicalReportHAL00860051,INRIA,Paris,France,2013.[18]S.Shalev-ShwartzandA.Tewari.Stochasticmethodsfor`1regularizedlossminimization.InProceedingsofthe26thInternationalConferenceonMachineLearning(ICML),pages929–936,Montreal,Canada,2009.[19]S.Shalev-ShwartzandT.Zhang.Stochasticdualcoordinateascentmethodsforregularizedlossminimization.JournalofMachineLearningResearch,14:567–599,2013.[20]S.Shalev-ShwartzandT.Zhang.Acceleratedproximalstochasticdualcoordinateascentforregularizedlossminimization.Proceedingsofthe31stInternationalConferenceonMachineLearning(ICML),JMLRW&CP,32(1):64–72,2014.[21]P.Tseng.Convergenceofablockcoordinatedescentmethodfornondifferentiableminimiza-tion.JournalofOptimizationTheoryandApplications,140:513–535,2001.[22]P.Tseng.Onacceleratedproximalgradientmethodsforconvex-concaveoptimization.Un-publishedmanuscript,2008.[23]L.XiaoandT.Zhang.Aproximalstochasticgradientmethodwithprogressivevariancere-duction.TechnicalReportMSR-TR-2014-38,MicrosoftResearch,2014.(arXiv:1403.4699).9

Related Contents


Next Show more