Alexander G Ororbia II The Pennsylvania State University IST 597 Foundations of Deep Learning About this chapter Not a comprehensive survey of all of linear algebra Focused on the subset most relevant to deep learning ID: 714360
Download Presentation The PPT/PDF document "The Elements of Linear Algebra" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Elements of Linear Algebra
Alexander G. Ororbia II
The Pennsylvania State University
IST 597: Foundations of Deep LearningSlide2
About this chapter
Not a comprehensive survey of all of linear algebra
Focused on the subset most relevant to deep learning
Larger subset: e.g.,
Linear Algebra
by
Georgi
E.
ShilovSlide3
Scalars
A scalar is a single number
Integers, real numbers, rational numbers, etc.
D
enote
d with italic font:Slide4
Vectors
A vector is a 1-D array of numbers:
Can be real, binary, integer, etc.
Example notation for type and size:Slide5
Matrices
A matrix is a 2-D array of numbers:
Example
notation for type and shape:Slide6
Tensors
A tensor is an array of numbers, that may have
zero dimensions, and be a scalar
one dimension, and be a vector
two dimensions, and be a matrix
or more dimensions.Slide7
https://www.slideshare.net/BertonEarnshaw/a-brief-survey-of-tensors
Tensor
= Multidimensional ArraySlide8
Matrix TransposeSlide9
Matrix (Dot) Product
=
•
m
p
m
p
n
n
Must
matchSlide10
Matrix Addition/Subtraction
Assume column-major matrices (for efficiency)
Add/subtract operators follow basic properties of normal add/subtract
Matrix A + Matrix B is computed element-wise
0.5
-0.7
-0.69
1.8
0.5
-0.7
-0.69
1.8
.5 +
.5
= 1.0
-.7 -
.7 =
-1.4
-.69 - .69 = -1.38
1.8 + 1.8 = 3.6
+
=Slide11
Matrix-Matrix Multiply
Matrix-Matrix multiply (outer product)
Vector-Vector multiply (dot product)
The usual workhorse of learning algorithms
Vectorizes
sums of products
0.5
-0.7
-0.69
1.8
0.5
-0.7
-0.69
1.8
(.5
* .5) + (-.7 * -.69)
(.5 * -.7) + (-.7
* 1.8)
(-.69 * .5) + (1.8
* -.69)
(-.69 * -.7) + (1.8 * 1.8)
*
=Slide12
Hadamard Product
Multiply each A(I, j) to each corresponding B(I, j)
Element-wise multiplication
0.5
-0.7
-0.69
1.8
0.5
-0.7
-0.69
1.8
.5 * .5 = .25
-.7
* .7 = .49
-.69 * -.69 = .4761
1.8
* 1.8 = 3.24
*@
=Slide13
Elementwise Functions
Applied to each element (
i
, j) of matrix argument
C
ould be
cos(.)
,
sin(.)
,
tanh
(.)
, etc.
Identity:
Logistic Sigmoid:
Linear Rectifier:
Softmax
:
0.5
-0.7
-0.69
1.8
(
1.0
) =
1
(
-1.4
) =
0
(
-1.38
) =
0
(
1.8
) =
1.8
=
(
)Slide14
Why do we care?
Computation Graphs
Linear algebra
operators arranged in a direct graph!Slide15
HSlide16
HSlide17
HSlide18
H
Slide19
Vector Form (One Unit)
:
*
=
+
+
)
This calculates activation value of single hidden unit that is connected to 3 sensors.Slide20
Vector Form (Two Units)
:
*
=
+
+
)
+
+
)
This vectorization easily generalizes to multiple sensors feeding into multiple units.
:
Known as vectorization!Slide21
Now Let Us Fully Vectorize This!
:
*
=
+
)
+
)
+
)
+
)
This vectorization is also important for formulating mini-batches.
(Good for GPU-based processing.)
:
Slide22
Identity MatrixSlide23
Systems of Equations
expands toSlide24
Solving Systems of Equations
A linear system of equations can have:
No solution
Many solutions
Exactly one solution: this means multiplication by the matrix is an invertible functionSlide25
Matrix Inversion
Matrix inverse:
Solving
a system using an inverse:
Numerically unstable, but useful for abstract analysisSlide26
Invertibility
Matrix can’t be inverted if…
More rows than columns
More columns than rows
Redundant rows/columns (“linearly dependent”, “low rank”)Slide27
Norms
Functions that measure how “large” a vector is
Similar to a distance between zero and the point represented by the vectorSlide28
L
p
norm
Most popular norm: L2 norm,
p
=2
(Euclidean)
L1 norm,
p
=1:
Max norm, infinite
p:
Norms
(Manhattan)Slide29
Unit vector:Symmetric Matrix:
Orthogonal matrix:
Special Matrices and VectorsSlide30
Eigendecomposition
Eigenvector and eigenvalue
:
Eigendecomposition
of a diagonalizable matrix
:Every real symmetric matrix has a real, orthogonal
eigendecomposition
:Slide31
Effect of EigenvaluesSlide32
Singular Value Decomposition
Similar to eigendecomposition
More general; matrix need not be squareSlide33
Moore-Penrose Pseudoinverse
If the equation has:
Exactly one solution: this is the same as the inverse.
No solution: this gives us the solution with the smallest error
Many solutions: this gives us the solution with the smallest norm of
x
.Slide34
Computing the Pseudoinverse
Take reciprocal of non-zero entries
The SVD allows the computation of the pseudoinverse:Slide35
TraceSlide36Slide37
Learning L
inear
A
lgebra
Do a lot of practice
problems
Linear Algebra Done Right
http
://
www.springer.com/us/book/9783319110790
Linear Algebra for Dummies
http://
www.wiley.com/WileyCDA/WileyTitle/productCd-0470430907.html
Start out with lots of summation signs and indexing into individual
entries
Code up a few basic matrix operations and compare to worked-out solutions
Eventually you will be able to mostly use matrix and vector product notation quickly and easilySlide38
References
This is a variation presentation of Ian
Goodfellow’s
slides, for
Chapter 2 of
Deep Learning (http://www.deeplearningbook.org/lecture_slides.html)