/
Matrix Factorization 1 Recovering latent factors in a matrix Matrix Factorization 1 Recovering latent factors in a matrix

Matrix Factorization 1 Recovering latent factors in a matrix - PowerPoint Presentation

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
450 views
Uploaded On 2018-03-13

Matrix Factorization 1 Recovering latent factors in a matrix - PPT Presentation

m columns v11 vij vnm n rows 2 Recovering latent factors in a matrix K m n K x1 y1 x2 y2 xn yn a1 a2 am b1 b2 bm v11 ID: 650071

matrix investing v11 vnm investing matrix vnm v11 vij latent factors loss blocks recovering movies size collaborative filtering users

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Matrix Factorization 1 Recovering latent..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Matrix Factorization

1Slide2

Recovering latent factors in a matrix

m

columns

v11

vij

vnm

n

rows

2Slide3

Recovering latent factors in a matrix

K * m

n * K

x1

y1

x2

y2

..

..

…xnyn

a1

a2

..

am

b1

b2

bm

v11

vij

vnm

~

3Slide4

What is this for?

K * m

n * K

x1

y1

x2

y2

..

..

…xnyn

a1

a2

..

am

b1

b2

bm

v11

vij

vnm

~

4Slide5

MF for collaborative filtering

5Slide6

What is collaborative filtering?

6Slide7

What is collaborative filtering?

7Slide8

What is collaborative filtering?

8Slide9

What is collaborative filtering?

9Slide10

10Slide11

Recovering latent factors in a matrix

m

movies

v11

vij

vnm

V[

i,j] = user i’s rating of movie j

n

users

11Slide12

Recovering latent factors in a matrix

m

movies

n

users

m

movies

x1

y1

x2

y2....

xn

yn

a1

a2

..

am

b1

b2

bm

v11…

……

vij…

vnm

~

V[

i,j

] = user i’s rating of movie j

12Slide13

13Slide14

MF for image modeling

14Slide15

15Slide16

MF for images

10,000 pixels

1000

images

1000 * 10,000,00

x1

y1

x2

y2

..

..

xn

yn

a1

a2

..

am

b1

b2

bm

v11

vij

vnm

~

V[

i,j

] = pixel j in image

i

2 prototypes

PC1

PC2

16Slide17

MF for modeling text

17Slide18

The Neatest Little Guide to Stock Market Investing

Investing For Dummies, 4th Edition

The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns

The Little Book of Value Investing

Value Investing: From Graham to Buffett and Beyond

Rich Dad’s Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not!

Investing in Real Estate, 5th Edition

Stock Investing For Dummies

Rich Dad’s Advisors: The ABC’s of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

18Slide19

https://technowiki.wordpress.com/2011/08/27/latent-semantic-analysis-lsa-tutorial/

TFIDF counts would be better

19Slide20

Recovering latent factors in a matrix

m

terms

n

documents

doc term matrix

x1

y1

x2

y2

....

……

xn

yn

a1

a2

..

am

b1

b2

bm

v11

……

vij…

vnm

~

V[

i,j

] = TFIDF score of term j in doc

i

20Slide21

=

21Slide22

Investing for real estate

Rich Dad’s Advisor’s: The ABCs of Real Estate Investment …

22Slide23

The little book of common sense investing: …

Neatest Little Guide to Stock Market Investing

23Slide24

MF is like clustering

24Slide25

k-means as MF

cluster means

n

examples

0

1

1

0

..

..

xn

yn

a1

a2

..

am

b1

b2

bm

v11

vij

vnm

~

original data set

indicators for r clusters

Z

M

X

25Slide26

How do you do it?

K * m

n * K

x1

y1

x2

y2

..

..

…xnyn

a1

a2

..

am

b1

b2

bm

v11

vij

vnm~

26Slide27

talk pilfered from

…..

KDD 2011

27Slide28

28Slide29

Recovering latent factors in a matrix

m

movies

n

users

m

movies

x1

y1

x2y2

....

……

xn

yn

a1

a2

..

am

b1

b2

bm

v11

……

…vij

vnm

~

V[

i,j

] = user i’s rating of movie j

r

W

H

V

29Slide30

30Slide31

31Slide32

32Slide33

f

or image

denoising

33Slide34

Matrix factorization as SGD

step size

why does this work?

34Slide35

Matrix factorization as SGD - why does this work? Here’s the key claim:

35Slide36

Checking the claim

Think for SGD for logistic regression

LR loss = compare

y

and

ŷ

= dot(

w,x)similar but now update w (user weights) and x (movie weight)36Slide37

What loss functions are possible?

N1, N2 - diagonal matrixes, sort of like IDF factors for the users/movies

“generalized” KL-divergence

37Slide38

What loss functions are possible?

38Slide39

What loss functions are possible?

39Slide40

ALS = alternating least squares

40Slide41

talk pilfered from

…..

KDD 2011

41Slide42

42Slide43

43Slide44

44Slide45

Similar to McDonnell et al with perceptron learning

45Slide46

Slow convergence…..

46Slide47

47Slide48

48Slide49

49Slide50

50Slide51

51Slide52

52Slide53

More detail….

Randomly permute rows/cols of matrix

Chop V,W,H into blocks of size

d x d

m/d

blocks in W, n/d blocks in HGroup the data:Pick a set of blocks with no overlapping rows or columns (a stratum)Repeat until all blocks in V are covered

Train the SGDProcess strata in seriesProcess blocks within a stratum in parallel53Slide54

More detail….

Z

was

V

54Slide55

More detail….

Initialize W,H randomly

not at zero

Choose a random ordering (random sort) of the points in a stratum in each “sub-epoch”

Pick strata sequence by permuting rows and columns of M, and using M’[k,i] as column index of row i in subepoch

k Use “bold driver” to set step size:increase step size when loss decreases (in an epoch)decrease step size when loss increasesImplemented in Hadoop and R/Snowfall

M=

55Slide56

56Slide57

Wall Clock Time

8 nodes, 64 cores, R/snow

57Slide58

58Slide59

59Slide60

60Slide61

61Slide62

Number of Epochs

62Slide63

63Slide64

64Slide65

65Slide66

66Slide67

Varying rank

100 epochs for all

67Slide68

Hadoop scalability

Hadoop

process setup time starts to dominate

68Slide69

Hadoop scalability

69Slide70

70