/
Semi-Stochastic Semi-Stochastic

Semi-Stochastic - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
394 views
Uploaded On 2016-03-19

Semi-Stochastic - PPT Presentation

Gradient Descent Methods Jakub Kone čný joint work with Peter Richt árik University of Edinburgh Introduction Large scale problem setting Problems are often structured Frequently arising in machine learning ID: 262167

stochastic gradient convergence single gradient stochastic single convergence descent evaluations iteration complexity accuracy iterations 2013 zhang methods convergencecomplexity small

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Semi-Stochastic" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Semi-Stochastic Gradient Descent Methods

Jakub

Kone

čný

(joint work with Peter

Richt

árik

)

University of EdinburghSlide2

IntroductionSlide3

Large scale problem setting

Problems are often structured

Frequently arising in machine learning

Structure – sum of functions

is BIGSlide4

ExamplesLinear regression (least squares)

Logistic regression (classification)

Slide5

AssumptionsLipschitz

continuity of derivative of

Strong convexity of Slide6

Gradient Descent (GD)

Update rule

Fast convergence rate

Alternatively, for accuracy we need

iterations

Complexity of single iteration –

(measured in gradient evaluations)Slide7

Stochastic Gradient Descent (SGD)Update rule

Why it works

Slow convergence

Complexity of single iteration –

(

measured in gradient evaluations)

a

step-size parameterSlide8

Goal

GD

SGD

Fast convergence

gradient evaluations in each iteration

Slow convergence

Complexity of iteration independent of

Combine in a single algorithmSlide9

Semi-Stochastic Gradient DescentS2GDSlide10

Intuition

The gradient

does not change

drastically

We could reuse the information from “old” gradientSlide11

Modifying “old” gradientImagine someone gives us a “good” point and

Gradient at point , near , can be expressed as

Approximation of the gradient

Already computed gradient

Gradient change

We can try to estimateSlide12

The S2GD Algorithm

Simplification; size of the inner loop is random, following a geometric ruleSlide13

TheoremSlide14

Convergence rate

How to set the parameters ?

Can be made arbitrarily small, by decreasing

For any fixed , can be made arbitrarily small by increasing Slide15

Setting the parameters

The accuracy is achieved by setting

Total complexity (in gradient evaluations)

# of epochs

full gradient evaluation

cheap iterations

# of epochs

stepsize

# of iterations

Fix target accuracySlide16

ComplexityS2GD complexity

GD complexity

iterations

complexity of a single iteration

TotalSlide17

Related Methods

SAG – Stochastic Average Gradient

(Mark Schmidt, Nicolas Le Roux, Francis Bach, 2013)

Refresh single stochastic gradient in each iteration

Need to store gradients.

Similar convergence rate

Cumbersome analysisMISO - Minimization by Incremental Surrogate Optimization (Julien Mairal, 2014)

Similar to SAG, slightly worse performanceElegant analysisSlide18

Related Methods

SVRG – Stochastic Variance Reduced Gradient

(

Rie

Johnson, Tong Zhang, 2013)

Arises as a special case in S2GD

Prox-SVRG(Tong Zhang, Lin Xiao, 2014)Extended

to proximal settingEMGD – Epoch Mixed Gradient Descent(Lijun Zhang, Mehrdad

Mahdavi , Rong Jin, 2013)Handles simple constraints, Worse convergence rate Slide19

ExperimentExample problem, with