/
Factorbird : a Parameter Server Approach to Distributed Matrix Factorization Factorbird : a Parameter Server Approach to Distributed Matrix Factorization

Factorbird : a Parameter Server Approach to Distributed Matrix Factorization - PowerPoint Presentation

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
389 views
Uploaded On 2018-02-24

Factorbird : a Parameter Server Approach to Distributed Matrix Factorization - PPT Presentation

Sebastian Schelter Venu Satuluri Reza Zadeh Distributed Machine Learning and Matrix Computations workshop in conjunction with NIPS 2014 Latent Factor Models Given M sparse n x ID: 635157

parameter user realgraph factorbird user parameter factorbird realgraph distributed matrix factorization function term experiments sgd lack bias detail problem

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Factorbird : a Parameter Server Approach..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Factorbird: a Parameter Server Approach to Distributed Matrix Factorization

Sebastian

Schelter

,

Venu

Satuluri

, Reza

Zadeh

Distributed Machine Learning and Matrix Computations workshop in conjunction with NIPS 2014Slide2

Latent Factor Models

Given

M

sparsen x mReturns U and Vrank kApplicationsDimensionality reductionRecommendationInferenceSlide3

Seem familiar?

So why not just use SVD?

SVD!Slide4

Problems with SVD

(Feb 24, 2015 edition)Slide5

Revamped loss function

g

– global bias term

bUi – user-specific bias term for user ibVj – item-specific bias term for item jprediction function p(

i

, j) = g +

b

U

i

+

b

V

j

+

u

T

ivja(i, j) – analogous to SVD’s mij (ground truth)New loss function:Slide6

AlgorithmSlide7

Problems

Resulting

U

and V, for graphs with millions of vertices, still equate to hundreds of gigabytes of floating point values.SGD is inherently sequential; either locking or multiple passes are required to synchronize.Slide8

Problem 1: size of parameters

Solution: Parameter Server architectureSlide9

Problem 2: simultaneous writes

Solution:

…so what?Slide10

Lock-free concurrent updates?

Assumptions

f is Lipshitz continuously differentiable f is strongly convexΩ (size of hypergraph) is small

Δ

(fraction of edges that intersect any variable) is

small

ρ

(

sparsity

of

hypergraph

) is

smallSlide11

Factorbird ArchitectureSlide12

Parameter server architecture

Open source!

http

://parameterserver.org/Slide13

Factorbird Machinery

memcached

– Distributed memory object caching system

finagle – Twitter’s RPC systemHDFS – persistent filestore for dataScalding – Scala front-end for Hadoop MapReduce jobsMesos – resource manager for learner machinesSlide14

Factorbird stubsSlide15

Model assessment

Matrix factorization using RMSE

Root-mean squared error

SGD performance often a function of hyperparametersλ: regularizationη: learning ratek: number of latent factorsSlide16

[Hyper]Parameter grid search

aka “parameter scans:” finding the optimal combination of

hyperparameters

Parallelize!Slide17

Experiments

RealGraph

”Not a dataset; a framework for creating graph of user-user interactions on TwitterKamath, Krishna, et al. "RealGraph: User Interaction Prediction at Twitter." User Engagement Optimization Workshop@ KDD. 2014.Slide18

ExperimentsData:

binarized

adjacency matrix of subset of Twitter follower graph

a(i, j) = 1 if user i interacted with user j, 0 otherwiseAll prediction errors weighted equally (w(i, j) = 1)100 million interactions440,000 [popular] usersSlide19

Experiments

80% training, 10% validation, 10% testingSlide20

Experiments

k

= 2

HomophilySlide21

ExperimentsScalability of

Factorbird

large

RealGraph subset229M x 195M (44.6 quadrillion)38.5 billion non-zero entriesSingle SGD pass through training set: ~2.5 hours~ 40 billion parametersSlide22

Important to note

As with most (if not all) distributed platforms:Slide23

Future workSupport streaming (user follows)

Simultaneous factorization

Fault tolerance

Reduce network traffics/memcached/custom application/gLoad balancingSlide24

StrengthsExcellent extension of prior work

Hogwild

,

RealGraphCurrent and [mostly] open technologyHadoop, Scalding, Mesos, memcachedClear problem, clear solution, clear validationSlide25

Weaknesses

Lack of detail, lack of detail, lack of detail

How does number of machines affect runtime?

What were performance metrics of the large RealGraph subset?What were some of the properties of the dataset (when was it collected, how were edges determined, what does “popular” mean, etc)?How did other factorization methods perform by comparison?Slide26

Questions?