We have that AA 1 that is that the product of AA is the sum of the outer products of the columns of To see this consider that AA ij 1 pi pj because the ij element is the th row of which is the vector a a ni dotted with the th column of which is ID: 23645 Download Pdf

380K - views

Published bylois-ondreau

We have that AA 1 that is that the product of AA is the sum of the outer products of the columns of To see this consider that AA ij 1 pi pj because the ij element is the th row of which is the vector a a ni dotted with the th column of which is

Download Pdf

Download Pdf - The PPT/PDF document "Properties of the Trace and Matrix Deriv..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Page 1

Properties of the Trace and Matrix Derivatives John Duchi Contents 1 Notation 2 Matrix multiplication 1 3 Gradient of linear function 1 4 Derivative in a trace 2 5 Derivative of product in trace 2 6 Derivative of function of a matrix 3 7 Derivative of linear transformed input to function 3 8 Funky trace derivative 3 9 Symmetric Matrices and Eigenvectors 4 1 Notation A few things on notation (which may not be very consistent, actually): The columns of a matrix are through , while the rows are given (as vectors) by throught 2 Matrix multiplication First, consider a matrix . We

have that AA =1 that is, that the product of AA is the sum of the outer products of the columns of . To see this, consider that AA ij =1 pi pj because the i,j element is the th row of , which is the vector ,a ,a ni , dotted with the th column of , which is ,a nj

Page 2

If we look at the matrix AA , we see that AA =1 =1 pn =1 pn =1 pn pn =1 in in in in =1 3 Gradient of linear function Consider Ax , where and . We have Ax Now let us consider Ax for . We have that Ax [ If we take the derivative with respect to one of the s, we have the component for each , which is to say il , and the

term for , which gives us that ∂x Ax =1 il + + x. In the end, we see that Ax Ax x. 4 Derivative in a trace Recall (as in Old and New Matrix Algebra Useful for Statistics ) that we can deﬁne the diﬀerential of a function ) to be the part of dx ) that is linear in dx , i.e. is a constant times dx . Then, for example, for a vector valued function , we can have ) = ) + + (higher order terms) In the above, is the derivative (or Jacobian). Note that the gradient is the transpose of the Jacobian. Consider an arbitrary matrix . We see that tr( AdX dX tr dx dx dX =1 dx dX Thus, we

have tr( AdX dX ij =1 dx ∂x ji ij so that tr( AdX dX A. Note that this is the Jacobian formulation.

Page 3

5 Derivative of product in trace In this section, we prove that tr AB tr AB = tr ~a ~a ~a = tr ~a ~a ~a ~a ~a ~a ~a ~a ~a =1 =1 ... =1 ni in tr AB ∂a ij ji tr AB 6 Derivative of function of a matrix Here we prove that ) = ( )) ) = ∂f ∂A 11 ∂f ∂A 21 ∂f ∂A ∂f ∂A 12 ∂f ∂A 22 ∂f ∂A ∂f ∂A ∂f ∂A ∂f ∂A nn = ( )) 7 Derivative of linear transformed input to function

Consider a function . Suppose we have a matrix and a vector . We wish to compute Ax ). By the chain rule, we have ∂f Ax ∂x =1 ∂f Ax Ax Ax ∂x =1 ∂f Ax Ax ( ∂x =1 ∂f Ax Ax ki =1 Ax ki Ax

Page 4

As such, Ax ) = Ax ). Now, if we would like to get the second derivative of this function (third derivatives would be a little nice, but I do not like tensors), we have Ax ∂x ∂x ∂x Ax ) = ∂x =1 ki ∂f Ax Ax =1 =1 ki Ax Ax Ax li Ax From this, it is easy to see that Ax ) = Ax 8 Funky trace derivative In this section, we prove

that tr ABA CAB AB In this bit, let us have AB ), where is matrix-valued. tr ABA tr tr tr = ( ) + ( tr AB + ( tr Cf )) AB + (( Cf )) AB CAB 9 Symmetric Matrices and Eigenvectors In this we prove that for a symmetric matrix , all the eigenvalues are real, and that the eigenvectors of form an orthonormal basis of First, we prove that the eigenvalues are real. Suppose one is complex: we have λx = ( Ax Ax λx x. Thus, all the eigenvalues are real. Now, we suppose we have at least one eigenvector = 0 of . Consider a space of vectors orthogonal to . We then have that, for Aw Av λw = 0

Thus, we have a set of vectors that, when transformed by , are still orthogonal to , so if we have an original eigenvector of , then a simple inductive argument shows that there is an orthonormal set of eigenvectors. To see that there is at least one eigenvector, consider the characteristic polynomial of ) = det( λI The ﬁeld is algebraicly closed, so there is at least one complex root , so we have that rI is singular and there is a vector = 0 that is an eigenvector of . Thus is a real eigenvalue, so we have the base case for our induction, and the proof is complete.

© 2020 docslides.com Inc.

All rights reserved.