/
Matrix Factorization Recovering latent factors in a matrix Matrix Factorization Recovering latent factors in a matrix

Matrix Factorization Recovering latent factors in a matrix - PowerPoint Presentation

pasty-toler
pasty-toler . @pasty-toler
Follow
406 views
Uploaded On 2018-02-19

Matrix Factorization Recovering latent factors in a matrix - PPT Presentation

m movies v11 vij vnm V ij user is rating of movie j n users Recovering latent factors in a matrix m movies n users m movies x1 y1 x2 y2 ID: 633218

size matrix loss movies matrix size movies loss blocks step sgd factors movie hadoop users factorization user rating vnm

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix Factorization Recovering latent f..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Matrix FactorizationSlide2

Recovering latent factors in a matrix

m

movies

v11

vij

vnm

V[i,j] = user i’s rating of movie j

n

usersSlide3

Recovering latent factors in a matrix

m

movies

n

users

m

movies

x1

y1x2

y2....

……

xn

yn

a1

a2

..

am

b1

b2

bm

v11

vij

…vnm

~

V[

i,j

] = user i’s rating of movie jSlide4

talk pilfered from

…..

KDD 2011Slide5
Slide6

Recovering latent factors in a matrix

m

movies

n

users

m

movies

x1

y1x2

y2....

……

xn

yn

a1

a2

..

am

b1

b2

bm

v11

vij

…vnm

~

V[

i,j

] = user i’s rating of movie j

r

W

H

VSlide7
Slide8
Slide9
Slide10

f

or image

denoisingSlide11

Matrix factorization as SGD

step sizeSlide12
Slide13
Slide14

Matrix factorization as SGD - why does this work?

step sizeSlide15

Matrix factorization as SGD - why does this work? Here’s the key claim:Slide16

Checking the claim

Think for SGD for logistic regression

LR loss = compare

y

and

ŷ

= dot(w,x)similar but now update w (user weights) and x (movie weight)Slide17

What loss functions are possible?

N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies

“generalized” KL-divergenceSlide18

What loss functions are possible?Slide19

What loss functions are possible?Slide20

ALS = alternating least squaresSlide21

talk pilfered from

…..

KDD 2011Slide22
Slide23
Slide24
Slide25

Similar to McDonnell et al with perceptron learningSlide26

Slow convergence…..Slide27
Slide28
Slide29
Slide30
Slide31
Slide32
Slide33

More detail….

Randomly permute rows/cols of matrix

Chop V,W,H into blocks of size

d x dm/d

blocks in W, n/d blocks in HGroup the data:Pick a set of blocks with no overlapping rows or columns (a stratum)Repeat until all blocks in V are covered

Train the SGDProcess strata in seriesProcess blocks within a stratum in parallelSlide34

More detail….

Z

was

VSlide35

More detail….

Initialize W,H randomly

not at zero

Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch

k Use “bold driver” to set step size:increase step size when loss decreases (in an epoch)decrease step size when loss increasesImplemented in Hadoop and R/Snowfall

M=Slide36
Slide37

Wall Clock Time

8 nodes, 64 cores, R/snowSlide38
Slide39
Slide40
Slide41
Slide42

Number of EpochsSlide43
Slide44
Slide45
Slide46
Slide47

Varying rank

100 epochs for all Slide48

Hadoop scalability

Hadoop

process setup time starts to dominateSlide49

Hadoop scalabilitySlide50