/
B Lecture  Gradient Descent Kris Hauser January   The rst multivariate optimization technique B Lecture  Gradient Descent Kris Hauser January   The rst multivariate optimization technique

B Lecture Gradient Descent Kris Hauser January The rst multivariate optimization technique - PDF document

faustina-dinatale
faustina-dinatale . @faustina-dinatale
Follow
492 views
Uploaded On 2015-01-15

B Lecture Gradient Descent Kris Hauser January The rst multivariate optimization technique - PPT Presentation

Gradient descent is an iterative method that is given an initial point and follows the negative of the gradient in order to move the point toward a critical point which is hopefully the desired local minimum Again we are concerned with only local op ID: 31832

Gradient descent

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "B Lecture Gradient Descent Kris Hauser ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Considermovingfromx1asmallamounthinaunitdirectionu.Wewantto ndtheuthatminimizesf(x1+hu).UsingtheTaylorexpansion,weseethatf(x1+hu)�f(x1)=hrf(x1)u+h2O(1):(2)Ifwemaketheh2terminsigni cantbyshrinkingh,weseethatinordertodecreasef(x1+hu)�f(x1)thefastestwemustminimizerf(x1)u.Theunitvectorthatminimizesrf(x1)uisu=�rf(x1)=jjrf(x1)jjasdesired.1.2AlgorithmThealgorithmisinitializedwithaguessx1,amaximumiterationcountNmax,agradientnormtolerancegthatisusedtodeterminewhetherthealgorithmhasarrivedatacriticalpoint,andasteptolerancextodeterminewhethersigni cantprogressisbeingmade.Itproceedsasfollows.1.Fort=1;2;:::;Nmax:2.xt+1 xt� trf(xt)3.Ifjjrf(xt+1)jjgthenreturn\Convergedoncriticalpoint"4.Ifjjxt�xt+1jjxthenreturn\Convergedonanxvalue"5.Iff(xt+1)&#x-278;f(xt)thenreturn\Diverging"6.Return\Maximumnumberofiterationsreached"Thevariable tisknownasthestepsize,andshouldbechosentomaintainabalancebetweenconvergencespeedandavoidingdivergence.Notethat tmaydependonthestept.1.3ChoosingastepsizeusinglinesearchToa rst-orderapproximation,eachstepdecreasesthevalueoffbyapprox-imately tjjrf(x0)jj2.If tistoosmall,thenthealgorithmwillconvergeveryslowly.Ontheotherhand,ifthestepsize tisnotchosensmallenough,thenthealgorithmmayfailtoreducethevalueoff.(Becausethe rstorderapproximationisvalidonlylocally.)Oneapproachistoadaptthestepsize tinordertoachieveareductioninfwhilestillmakingsucientlyfastprogress.Thisprocedureisknownas2 axisthanthex1axis.So,thelinesearchwillnotbedirectedtowardtheorigin.Thisgivesgradientdescentacharacteristiczig-zaggingtrajectorythattakeslongertoconvergetotheminimum.Ingeneral,slowconvergenceholdswhenthe\bowl"aroundthelocalminimumismuchthinnerinonedirectionthananother.Problemslikethisaresaidtohaveanill-conditionedHessianmatrix.Wewillmakethismorepreciseinfuturelectures.2Variants2.1SteepestdescentindiscretespacesGradientdescentcanbegeneralizedtospacesthatinvolveadiscretecom-ponent.Themethodofsteepestdescentisthediscreteanalogueofgradientdescent,butthebestmoveiscomputedusingalocalminimizationratherratherthancomputingagradient.Itistypicallyabletoconvergeinfewstepsbutitisunabletoescapelocalminimaorplateausintheobjectivefunction.Adiscretesearchproblemisde nedbyadiscretestatespaceSandasetofedgesESS.ThisinducesagraphG=(S;E)thatmustbesearchedinorderto ndastatethatminimizesagivenfunctionf(s).Asteepestdescentsearchbeginsatastates0andtakesstepsonGthatdescendf(s)withthemaximumrateofdescent.Thealgorithmrepeatedlycomputeseachtransitions!s0byperform-ingthelocalminimizationargmins!s02Ef(s0),andterminateswhenf(s0)f(s).ThisapproachistypicallymuchfasterthananexhaustivesearchifSislargeorpossiblyin nitebutthedegreeofGissmall.2.2HillClimbingHillclimbingisasimilarapproachtosteepestdescentthatisusedforlargediscreteproblemsinwhichthestatespaceiscombinatorial.(Althoughthenameishillclimbingtheapproachcanbeappliedtoeitherminimizationormaximization.)Ifthestateiscomposedofseveralattributess=(s1;:::;sn),anoptimizationcanbeformulatedoverthecompositestatespaceS=S1Sn.Ineachiteration,eachoftheattributessiislocallyoptimizedinSiusingasinglestepofsteepestdescent.Thesesweepscontinueuntilconvergenceoraterminationconditionisreached.4 3Exercises1.Showthatforadecomposiblefunctionf(x1;x2)=g(x1)+h(x2),the(x?1;x?2)thatoptimizesfcanbecomputedbyoptimizingg(x1)andh(x2)individually.Ifeachevaluationofg,rg,h,andrhhascost1,determinethenumberofoperationssavedbyperformingindivid-ualgradientdescentsratherthananoverallgradientdescent(makeassumptionsaboutthenumberofstepstakenuntilconvergence).2.Forthequadraticfunctionf(x1;x2)=ax21+bx22, ndanexactexpres-sionforthenextxt+1computedbyanoptimallinesearchstartingfromxt.Determinetherateofconvergencefromthisexpression.3.Ifpistheprobabilityofsamplinganinitialguessinthebasinofat-tractionoftheglobalminimum,thenwhatistheprobabilitythatoneofnrandomrestartsreachaglobalminimum?6