How Yep Take derivative set equal to zero and try to solve for 1 2 2 3 df dx 1 22 2 2 4 2 df dx 0 2 4 2 2 12 32 Closed8722form solution 3 26 brPage 4br CS545 Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Mi ID: 27488 Download Pdf

196K - views

Published bysherrill-nordquist

How Yep Take derivative set equal to zero and try to solve for 1 2 2 3 df dx 1 22 2 2 4 2 df dx 0 2 4 2 2 12 32 Closed8722form solution 3 26 brPage 4br CS545 Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Mi

Download Pdf

Download Pdf - The PPT/PDF document "CS Gradient Descent Chuck Anderson Gradi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS545: Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall, 2009 1 / 26

Page 2

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Outline Gradient Descent Parabola Examples in R 2 / 26

Page 3

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, ﬁnd argmax ). How? Yep. Take derivative, set

equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 3 / 26

Page 4

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, ﬁnd argmax ). How? Yep. Take derivative, set equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 4 / 26

Page 5

CS545: Gradient Descent Chuck Anderson Gradient

Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 5 / 26

Page 6

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at

that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 6 / 26

Page 7

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for

example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 7 / 26

Page 8

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 8 / 26

Page 9

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know

the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 9 / 26

Page 10

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 10 / 26

Page 11

CS545:

Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 11 / 26

Page 12

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what

can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 12 / 26

Page 13

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has

1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 13 / 26

Page 14

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which

the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale the step size. 14 / 26

Page 15

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well

behaved. 00 for 0 <α<< This gives us a way to scale the step size. 15 / 26

Page 16

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two ﬁrst derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale

the step size. 16 / 26

Page 17

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 17 / 26

Page 18

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See

the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 18 / 26

Page 19

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 19 / 26

Page 20

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola

Examples in R Parabola Example function (x) 1.2 (x 2)2 + 3.2 grad function (x) 1.2 (x 2) secondGrad function (x) 2.4 20 / 26

Page 21

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent xs seq (0,4,len=20) plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) ### df dx = 2.4(x 2) ### df dx = 0 0 = 2.4x 4.8 x = 2 lines (2,2), (3,8), col =red,lty=2) text (2.1,7, Closed form solution, col =red, pos =4) ### gradient descent 0.1 xtrace ftrace f(x) stepFactor 0.6 ### try larger and smaller values (0.8 and 0.01) for step in

1:100) stepFactor grad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent, col =blue, pos =4) 21 / 26

Page 22

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R 1.2 3.2 Closed−form solution Gradient Descent 22 / 26

Page 23

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent with gradientDescents.R source (gradientDescents.R) 0.1 result steepest(x, f, grad, stepsize =0.6, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot

(xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) lines ( result xtrace , result ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent with steepest (), col =blue, pos =4) 23 / 26

Page 24

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent scaled with Newtons Method plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) 0.1 xtrace ftrace f(x) for step in 1:100) grad(x) secondGrad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=b, col =blue) text (0.5,6, Newton s

Gradient Descent, col =blue, pos =4) 24 / 26

Page 25

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R With Scaled Conjugate Gradient from gradientDescents.R source (gradientDescents.R) 0.1 result scg(x, f, grad, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) lines ( result xtrace , result ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent with scg(), col =blue, pos =4) 25 / 26

Page 26

CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola

Examples in R Results 1.2 3.2 Closed−form solution Gradient Descent 1.2 3.2 Gradient Descent with steepest() 1.2 3.2 Newton's Gradient Descent 1.2 3.2 Gradient Descent with scg() 26 / 26

© 2020 docslides.com Inc.

All rights reserved.