m columns v11 vij vnm n rows 2 Recovering latent factors in a matrix K m n K x1 y1 x2 y2 xn yn a1 a2 am b1 b2 bm v11 ID: 650071
Download Presentation The PPT/PDF document "Matrix Factorization 1 Recovering latent..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Matrix Factorization
1Slide2
Recovering latent factors in a matrix
m
columns
v11
…
…
…
vij
…
vnm
n
rows
2Slide3
Recovering latent factors in a matrix
K * m
n * K
x1
y1
x2
y2
..
..
…
…xnyn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
…
…
vij
…
vnm
~
3Slide4
What is this for?
K * m
n * K
x1
y1
x2
y2
..
..
…
…xnyn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
…
…
vij
…
vnm
~
4Slide5
MF for collaborative filtering
5Slide6
What is collaborative filtering?
6Slide7
What is collaborative filtering?
7Slide8
What is collaborative filtering?
8Slide9
What is collaborative filtering?
9Slide10
10Slide11
Recovering latent factors in a matrix
m
movies
v11
…
…
…
vij
…
vnm
V[
i,j] = user i’s rating of movie j
n
users
11Slide12
Recovering latent factors in a matrix
m
movies
n
users
m
movies
x1
y1
x2
y2....
…
…
xn
yn
a1
a2
..
…
am
b1
b2
…
…
bm
v11…
……
vij…
vnm
~
V[
i,j
] = user i’s rating of movie j
12Slide13
13Slide14
MF for image modeling
14Slide15
15Slide16
MF for images
10,000 pixels
1000
images
1000 * 10,000,00
x1
y1
x2
y2
..
..
…
…
xn
yn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
…
…
…
…
vij
…
vnm
~
V[
i,j
] = pixel j in image
i
2 prototypes
PC1
PC2
16Slide17
MF for modeling text
17Slide18
The Neatest Little Guide to Stock Market Investing
Investing For Dummies, 4th Edition
The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns
The Little Book of Value Investing
Value Investing: From Graham to Buffett and Beyond
Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not!
Investing in Real Estate, 5th Edition
Stock Investing For Dummies
Rich Dad’s Advisors: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
18Slide19
https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/
TFIDF counts would be better
19Slide20
Recovering latent factors in a matrix
m
terms
n
documents
doc term matrix
x1
y1
x2
y2
....
……
xn
yn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
……
vij…
vnm
~
V[
i,j
] = TFIDF score of term j in doc
i
20Slide21
=
21Slide22
Investing for real estate
Rich Dad’s Advisor’s: The ABCs of Real Estate Investment …
22Slide23
The little book of common sense investing: …
Neatest Little Guide to Stock Market Investing
23Slide24
MF is like clustering
24Slide25
k-means as MF
cluster means
n
examples
0
1
1
0
..
..
…
…
xn
yn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
…
…
vij
…
vnm
~
original data set
indicators for r clusters
Z
M
X
25Slide26
How do you do it?
K * m
n * K
x1
y1
x2
y2
..
..
…
…xnyn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
…
…
…
vij
…
vnm~
26Slide27
talk pilfered from
…..
KDD 2011
27Slide28
28Slide29
Recovering latent factors in a matrix
m
movies
n
users
m
movies
x1
y1
x2y2
....
……
xn
yn
a1
a2
..
…
am
b1
b2
…
…
bm
v11
……
…vij
…
vnm
~
V[
i,j
] = user i’s rating of movie j
r
W
H
V
29Slide30
30Slide31
31Slide32
32Slide33
f
or image
denoising
33Slide34
Matrix factorization as SGD
step size
why does this work?
34Slide35
Matrix factorization as SGD - why does this work? Here’s the key claim:
35Slide36
Checking the claim
Think for SGD for logistic regression
LR loss = compare
y
and
ŷ
= dot(
w,x)similar but now update w (user weights) and x (movie weight)36Slide37
What loss functions are possible?
N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies
“generalized” KL-divergence
37Slide38
What loss functions are possible?
38Slide39
What loss functions are possible?
39Slide40
ALS = alternating least squares
40Slide41
talk pilfered from
…..
KDD 2011
41Slide42
42Slide43
43Slide44
44Slide45
Similar to McDonnell et al with perceptron learning
45Slide46
Slow convergence…..
46Slide47
47Slide48
48Slide49
49Slide50
50Slide51
51Slide52
52Slide53
More detail….
Randomly permute rows/cols of matrix
Chop V,W,H into blocks of size
d x d
m/d
blocks in W, n/d blocks in HGroup the data:Pick a set of blocks with no overlapping rows or columns (a stratum)Repeat until all blocks in V are covered
Train the SGDProcess strata in seriesProcess blocks within a stratum in parallel53Slide54
More detail….
Z
was
V
54Slide55
More detail….
Initialize W,H randomly
not at zero
Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”
Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch
k Use “bold driver” to set step size:increase step size when loss decreases (in an epoch)decrease step size when loss increasesImplemented in Hadoop and R/Snowfall
M=
55Slide56
56Slide57
Wall Clock Time
8 nodes, 64 cores, R/snow
57Slide58
58Slide59
59Slide60
60Slide61
61Slide62
Number of Epochs
62Slide63
63Slide64
64Slide65
65Slide66
66Slide67
Varying rank
100 epochs for all
67Slide68
Hadoop scalability
Hadoop
process setup time starts to dominate
68Slide69
Hadoop scalability
69Slide70
70