/
Factorbird : a Parameter Server Approach to Distributed Matrix Factorization Factorbird : a Parameter Server Approach to Distributed Matrix Factorization

Factorbird : a Parameter Server Approach to Distributed Matrix Factorization - PowerPoint Presentation

rayfantasy
rayfantasy . @rayfantasy
Follow
342 views
Uploaded On 2020-08-03

Factorbird : a Parameter Server Approach to Distributed Matrix Factorization - PPT Presentation

Sebastian Schelter Venu Satuluri Reza Zadeh Distributed Machine Learning and Matrix Computations workshop in conjunction with NIPS 2014 Latent Factor Models Given M sparse n x ID: 796205

factorbird user realgraph parameter user factorbird parameter realgraph distributed matrix factorization function svd architecture solution experiments counting lack problem

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Factorbird : a Parameter Server Approach..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Factorbird: a Parameter Server Approach to Distributed Matrix Factorization

Sebastian

Schelter

,

Venu

Satuluri

, Reza

Zadeh

Distributed Machine Learning and Matrix Computations workshop in conjunction with NIPS 2014

Slide2

Latent Factor Models

Given

M

sparsen x mReturns U and Vrank kApplicationsDimensionality reductionRecommendationInference

Slide3

Seem familiar?

So why not just use SVD?

SVD!

Slide4

Problems with SVD

(Feb 24, 2015 edition)

Slide5

Revamped loss function

g

– global bias term

bUi – user-specific bias term for user ibVj – item-specific bias term for item jprediction function p(

i

, j) = g +

b

U

i

+

b

V

j

+

u

T

ivja(i, j) – analogous to SVD’s mij (ground truth)New loss function:

Slide6

Algorithm

Slide7

Problems

Resulting

U

and V, for graphs with millions of vertices, still equate to hundreds of gigabytes of floating point values.SGD is inherently sequential; either locking or multiple passes are required to synchronize.

Slide8

Problem 1: size of parameters

Solution: Parameter Server architecture

Slide9

Problem 2: simultaneous writes

Solution:

…so what?

Slide10

Lock-free concurrent updates?

Assumptions

f is Lipshitz continuously differentiable f is strongly convexΩ (size of hypergraph) is small

Δ

(fraction of edges that intersect any variable) is

small

ρ

(

sparsity

of

hypergraph

) is

small

Slide11

Hogwild! Lock-free updates

Slide12

Factorbird Architecture

Slide13

Parameter server architecture

Open source!

http

://parameterserver.org/

Slide14

Factorbird Machinery

memcached

– Distributed memory object caching system

finagle – Twitter’s RPC systemHDFS – persistent filestore for dataScalding – Scala front-end for Hadoop MapReduce jobsMesos – resource manager for learner machines

Slide15

Factorbird stubs

Slide16

Model assessment

Matrix factorization using RMSE

Root-mean squared error

SGD performance often a function of hyperparametersλ: regularizationη: learning ratek: number of latent factors

Slide17

[Hyper]Parameter grid search

aka “parameter scans:” finding the optimal combination of

hyperparameters

Parallelize!

Slide18

Experiments

RealGraph

”Not a dataset; a framework for creating graph of user-user interactions on TwitterKamath, Krishna, et al. "RealGraph: User Interaction Prediction at Twitter." User Engagement Optimization Workshop@ KDD. 2014.

Slide19

ExperimentsData:

binarized

adjacency matrix of subset of Twitter follower graph

a(i, j) = 1 if user i interacted with user j, 0 otherwiseAll prediction errors weighted equally (w(i, j) = 1)100 million interactions440,000 [popular] users

Slide20

Experiments

80% training, 10% validation, 10% testing

Slide21

Experiments

k

= 2

Homophily

Slide22

ExperimentsScalability of

Factorbird

large

RealGraph subset229M x 195M (44.6 quadrillion)38.5 billion non-zero entriesSingle SGD pass through training set: ~2.5 hours~ 40 billion parameters

Slide23

Important to note

As with most (if not all) distributed platforms:

Slide24

Future workSupport streaming (user follows)

Simultaneous factorization

Fault tolerance

Reduce network traffics/memcached/custom application/gLoad balancing

Slide25

StrengthsExcellent extension of prior work

Hogwild

,

RealGraphCurrent and [mostly] open technologyHadoop, Scalding, Mesos, memcachedClear problem, clear solution, clear validation

Slide26

Weaknesses

Lack of detail, lack of detail, lack of detail

How does number of machines affect runtime?

What were performance metrics of the large RealGraph subset?What were some of the properties of the dataset (when was it collected, how were edges determined, what does “popular” mean, etc)?How did other factorization methods perform by comparison?

Slide27

Questions?

Slide28

Assignment 1

Code: 65pts

20:

NBTrain (counting)20: message passing and sorting20: NBTest (scanning model, accuracy)5: How to runQ1-Q5: 35pts

Slide29

Assignment 2Code: 70pts

20: MR for counting words

15: MR for counting labels

20: MR for joining model + test data15: MR for classification5: How to runQ1-Q2: 30pts

Slide30

Assignment 3Code: 50pts

10: Compute TF

10: Compute IDF

25: K-means iterations5: How to runQ1-Q4: 50pts