 196K - views

# CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall CS Gradient Descent Ch

How Yep Take derivative set equal to zero and try to solve for 1 2 2 3 df dx 1 22 2 2 4 2 df dx 0 2 4 2 2 12 32 Closed8722form solution 3 26 brPage 4br CS545 Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Mi

## Presentation on theme: "CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall CS Gradient Descent Ch"ŌĆö Presentation transcript:

Page 1
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS545: Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall, 2009 1 / 26
Page 2
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Outline Gradient Descent Parabola Examples in R 2 / 26
Page 3
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, ﬁnd argmax ). How? Yep. Take derivative, set

equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 3 / 26
Page 4
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, ﬁnd argmax ). How? Yep. Take derivative, set equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 4 / 26
Page 5

Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 5 / 26
Page 6
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at

that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 6 / 26
Page 7
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for

example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 7 / 26
Page 8
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use NewtonÆs method (see the Wikipedia entry for ōNewtonÆs methodö) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 8 / 26
Page 9
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know

the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use NewtonÆs method (see the Wikipedia entry for ōNewtonÆs methodö) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 9 / 26
Page 10
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use NewtonÆs method (see the Wikipedia entry for ōNewtonÆs methodö) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 10 / 26
Page 11
CS545:

Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use NewtonÆs method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 11 / 26
Page 12
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what

can we do? Cannot solve directly for . Can still do gradient descent. Can we always use NewtonÆs method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 12 / 26
Page 13
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use NewtonÆs method? No. Reason 1: If has

1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 13 / 26
Page 14
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which

the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale the step size. 14 / 26
Page 15
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well

behaved. 00 for 0 <α<< This gives us a way to scale the step size. 15 / 26
Page 16
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale

the step size. 16 / 26
Page 17
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for ōconjugate gradientö) The conjugate gradient direction is based on the previous direction and the current gradient. 17 / 26
Page 18
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See

the wikipedia entry for ōconjugate gradientö) The conjugate gradient direction is based on the previous direction and the current gradient. 18 / 26
Page 19
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for ōconjugate gradientö) The conjugate gradient direction is based on the previous direction and the current gradient. 19 / 26
Page 20

Examples in R Parabola Example function (x) 1.2 (x 2)ł2 + 3.2 grad function (x) 1.2 (x 2) secondGrad function (x) 2.4 20 / 26
Page 21
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent xs seq (0,4,len=20) plot (xs, f(xs ), type=ölö,xlab=öxö,ylab= expression (1.2(x 2)ł2 +3.2)) ### df dx = 2.4(x 2) ### df dx = 0 0 = 2.4x 4.8 x = 2 lines (2,2), (3,8), col =öredö,lty=2) text (2.1,7, öClosed form solutionö, col =öredö, pos =4) ### gradient descent 0.1 xtrace ftrace f(x) stepFactor 0.6 ### try larger and smaller values (0.8 and 0.01) for step in

1:100) stepFactor grad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=öbö, col =öblueö) text (0.5,6, öGradient Descentö, col =öblueö, pos =4) 21 / 26
Page 22
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R 1.2 3.2 Closed−form solution Gradient Descent 22 / 26
Page 23
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent with gradientDescents.R source (ögradientDescents.Rö) 0.1 result steepest(x, f, grad, stepsize =0.6, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot

(xs, f(xs ), type=ölö,xlab=öxö,ylab= expression (1.2(x 2)ł2 +3.2)) lines ( result xtrace , result ftrace ,type=öbö, col =öblueö) text (0.5,6, öGradient Descent with steepest ()ö, col =öblueö, pos =4) 23 / 26
Page 24
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent scaled with NewtonÆs Method plot (xs, f(xs ), type=ölö,xlab=öxö,ylab= expression (1.2(x 2)ł2 +3.2)) 0.1 xtrace ftrace f(x) for step in 1:100) grad(x) secondGrad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=öbö, col =öblueö) text (0.5,6, öNewton s

Gradient Descentö, col =öblueö, pos =4) 24 / 26
Page 25
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R With Scaled Conjugate Gradient from gradientDescents.R source (ögradientDescents.Rö) 0.1 result scg(x, f, grad, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot (xs, f(xs ), type=ölö,xlab=öxö,ylab= expression (1.2(x 2)ł2 +3.2)) lines ( result xtrace , result ftrace ,type=öbö, col =öblueö) text (0.5,6, öGradient Descent with scg()ö, col =öblueö, pos =4) 25 / 26
Page 26