CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall      CS Gradient Descent Ch
196K - views

CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall CS Gradient Descent Ch

How Yep Take derivative set equal to zero and try to solve for 1 2 2 3 df dx 1 22 2 2 4 2 df dx 0 2 4 2 2 12 32 Closed8722form solution 3 26 brPage 4br CS545 Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Mi

Download Pdf

CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall CS Gradient Descent Ch




Download Pdf - The PPT/PDF document "CS Gradient Descent Chuck Anderson Gradi..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "CS Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall CS Gradient Descent Ch"— Presentation transcript:


Page 1
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R CS545: Gradient Descent Chuck Anderson Department of Computer Science Colorado State University Fall, 2009 1 / 26
Page 2
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Outline Gradient Descent Parabola Examples in R 2 / 26
Page 3
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, find argmax ). How? Yep. Take derivative, set

equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 3 / 26
Page 4
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Finding Minimum of Parabola Find that is minimum of ) = 1 2( 2) + 3 2 or, said another way, find argmax ). How? Yep. Take derivative, set equal to zero, and try to solve for ) = 1 2( 2) + 3 df dx = 1 2(2)( 2) = 2 4( 2) df dx = 0 = 2 4( 2) = 2 1.2 3.2 Closed−form solution 4 / 26
Page 5
CS545: Gradient Descent Chuck Anderson Gradient

Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 5 / 26
Page 6
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at

that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 6 / 26
Page 7
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent But, if df dx is cannot be solved directly for , what can we do? Start at some value, use derivative at that value to tell us which way to move, and repeat. Gradient descent. is factor of derivative to control how far to go df dx = 2 4( 2) (0) = 0 (for

example) ) = 1) 4( 2) 1.2 3.2 Closed−form solution Gradient Descent 7 / 26
Page 8
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 8 / 26
Page 9
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know

the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 9 / 26
Page 10
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R For a parabola, can get there much faster if we also know the second derivative, which is what? df dx = 2 4( 2) dx 00 = 2 and use Newtons method (see the Wikipedia entry for Newtons method) ) = 1) 00 ) = 1) 4( 2) ) = 1) 2) 1.2 3.2 Newton's Gradient Descent 10 / 26
Page 11
CS545:

Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 11 / 26
Page 12
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what

can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has 1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 12 / 26
Page 13
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Gradient Descent If the function is not a parabola, what can we do? Cannot solve directly for . Can still do gradient descent. Can we always use Newtons method? No. Reason 1: If has

1000 components, the second derivative (Hessian) is a 1000 1000 matrix. May be too big. Reason 2: If not a parabola the second derivative information may lead you very far away. When? 13 / 26
Page 14
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two first derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which

the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale the step size. 14 / 26
Page 15
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two first derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well

behaved. 00 for 0 <α<< This gives us a way to scale the step size. 15 / 26
Page 16
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Approximating the Second Derivative Say we have picked a direction, , to go. Rather than compute the second derivative in that direction, we can approximate it using two first derivative values. 00 for 0 <α<< In practice, Moller found he had to modify this by adding where is set to a value for which the resulting approximated second derivative is well behaved. 00 for 0 <α<< This gives us a way to scale

the step size. 16 / 26
Page 17
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 17 / 26
Page 18
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See

the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 18 / 26
Page 19
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Picking a Good Direction Now, how about that direction? How do we decide that? Moller uses conjugate gradients. (See the wikipedia entry for conjugate gradient) The conjugate gradient direction is based on the previous direction and the current gradient. 19 / 26
Page 20
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola

Examples in R Parabola Example function (x) 1.2 (x 2)2 + 3.2 grad function (x) 1.2 (x 2) secondGrad function (x) 2.4 20 / 26
Page 21
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent xs seq (0,4,len=20) plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) ### df dx = 2.4(x 2) ### df dx = 0 0 = 2.4x 4.8 x = 2 lines (2,2), (3,8), col =red,lty=2) text (2.1,7, Closed form solution, col =red, pos =4) ### gradient descent 0.1 xtrace ftrace f(x) stepFactor 0.6 ### try larger and smaller values (0.8 and 0.01) for step in

1:100) stepFactor grad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent, col =blue, pos =4) 21 / 26
Page 22
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R 1.2 3.2 Closed−form solution Gradient Descent 22 / 26
Page 23
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent with gradientDescents.R source (gradientDescents.R) 0.1 result steepest(x, f, grad, stepsize =0.6, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot

(xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) lines ( result xtrace , result ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent with steepest (), col =blue, pos =4) 23 / 26
Page 24
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R Steepest Descent scaled with Newtons Method plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) 0.1 xtrace ftrace f(x) for step in 1:100) grad(x) secondGrad(x) xtrace (xtrace,x) ftrace (ftrace,f(x)) lines (xtrace , ftrace ,type=b, col =blue) text (0.5,6, Newton s

Gradient Descent, col =blue, pos =4) 24 / 26
Page 25
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola Examples in R With Scaled Conjugate Gradient from gradientDescents.R source (gradientDescents.R) 0.1 result scg(x, f, grad, nIterations =100, xtracep=TRUE, ftracep=TRUE) plot (xs, f(xs ), type=l,xlab=x,ylab= expression (1.2(x 2)2 +3.2)) lines ( result xtrace , result ftrace ,type=b, col =blue) text (0.5,6, Gradient Descent with scg(), col =blue, pos =4) 25 / 26
Page 26
CS545: Gradient Descent Chuck Anderson Gradient Descent Parabola

Examples in R Results 1.2 3.2 Closed−form solution Gradient Descent 1.2 3.2 Gradient Descent with steepest() 1.2 3.2 Newton's Gradient Descent 1.2 3.2 Gradient Descent with scg() 26 / 26