/
Sareh Nabi University of Washington – Microsoft Dynamics Sareh Nabi University of Washington – Microsoft Dynamics

Sareh Nabi University of Washington – Microsoft Dynamics - PowerPoint Presentation

Dollface
Dollface . @Dollface
Follow
342 views
Uploaded On 2022-08-02

Sareh Nabi University of Washington – Microsoft Dynamics - PPT Presentation

Work started while intern at Amazon Bayesian MetaPrior Learning Using Empirical Bayes Paper in collaboration with Houssam Nassif Joseph Hong Hamed Mamani Guido Imbens Sequential decision making under uncertainty ID: 932397

empirical prior bayesian meta prior empirical meta bayesian order variance glm bayes join weight model data features learning helps

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Sareh Nabi University of Washington – ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Sareh NabiUniversity of Washington – Microsoft Dynamics Work started while intern at Amazon

Bayesian Meta-Prior Learning Using Empirical Bayes

Paper in collaboration with:

Houssam

Nassif, Joseph Hong, Hamed

Mamani

, Guido

Imbens

Slide2

Sequential decision making under uncertainty

Slide3

Exploration vs Exploitation Trade-off

3

Explore/Learn

Learn

more about what is good and badExploit/Earn Make the best use of what we believe is good

Slide4

Multi-armed Bandit (MAB)

4

Originally proposed by Robbins (1952).

Reward under

arm i follows a Bernoulli dist. with unknown mean reward

Goal: Pull arm sequentially to maximize total reward. 

Slide5

MAB Algorithms …Algorithms that solve MAB problems

Index policies: Gittin's

index

Frequentist approach: Upper Confidence Bound type

algsBayesian approach: challenge of non-informative prior

5

Slide6

What we address …

Propose a framework to learn meta-prior from initial data using empirical Bayes Decouple learning rates of model parameters by grouping similar ones in the same category and learning their meta-prior separately

Next:

Overview of our meta-prior framework and its application to our ad layout optimization problem (in Amazon production system) and classification problem (on a public dataset). Lastly, review our theoretical results.

6

Slide7

7Biking competition

Join now!

Check other teams

Stop biking pointlessly, join team Circle!

Year-round carousel rides!

Unlimited laps!All you can eat Pi!No corners to hide!Hurry, don’t linger around!

Slide8

8Multiple-slot template

Join now!

Check other teams

Stop biking pointlessly, join team Circle!

Year-round carousel rides!

Unlimited laps!All you can eat Pi!No corners to hide!Hurry, don’t linger around!

Slot

Slide9

9Template Combinatorics: 48 layouts

Join now!

Check other teams

Stop biking pointlessly, join team Circle!

Best way to bicycle!

We cycle and re-cycle!No point in not joining!

x2

x2

x2

x2

x3

Slide10

FeaturizationFirst-order features,

X1: variant valuesE.g. “image1”, “title2”, “color1”

Second-order features,

X

2: variant combinations/interactions E.g. “image1 AND title2”, “image1 AND color1”Concatenate for final feature vector representation: X = [X

1, X2]10

Slide11

Generalized Linear Models (GLM)

Assign one weight wi per feature

x

i

Let r: reward (e.g. click), g: link function

intercept + 1

st

order + 2

nd

order effect

 

11

Slide12

Bayesian GLM

At time t, each weight w

i

is represented by a

distribution:

Eg:

 

12

 

 

+

 

 

=

 

Slide13

Bayesian GLM weight updates

wi,t

update

:

Starting prior: Non-informative standard normal

Bayesian GLM bandits (Thompson Sampling): Sample W, then

)

 

13

w

i,t

prior

+ data =

w

i,

t+1

posterior

Slide14

Empirical Bayes Approach14

Slide15

Impose Bayesian hierarchy

Let

: true mean of weight

: GLM-estimated mean and variance of weight

Critical model assumptions:

Group features into

C

k=2 categories

(1st order, 2nd order)Each category has a hierarchical

meta-prior

True mean

Observed mean

 

15

Slide16

Model assumptions

 

16

,

 

Empirical prior helps, eve with one batch

Type equation here.

 

Meta prior

 

Meta prior

 

, …

 

Slide17

Empirical prior estimation

Using variance decomposition & our assumptions:

Substituting

empirical mean and variance estimators,

 

17

Slide18

Empirical Bayes (EB) algorithm

Start Bayesian GLM with non-informative prior

After time

t

, compute

for each categoryRestart model with per-category empirical priors

Retrain using data from elapsed t

timestepsContinue using GLM as normalAlgorithm used the same data twice, to compute the empirical prior and to train the modelWorks with online or batch settings

 18

Slide19

Experiments and Results19

Slide20

Experimental setup

Used Bayesian Linear Probit (BLIP) as our Bayesian GLMGrouped features as 1

st

and 2

nd order featuresbatch updatesSimulations on classification task (UCI income prediction)Live experiments in Amazon production system

had little effect, we focus on  

20

Slide21

When to compute empirical prior21

Empirical prior helps, even with one batch

Longer reset time helps more

Slide22

Lasting effect on small batches22

BLIPBayes outperforms both other methods

BLIPBayes can be especially valuable for small batch training

Slide23

Empirical outperforms random

 

23

Empirical Bayes did best

Erring towards a large prior variance is more easily overcome by data than erring towards a small prior variance

Slide24

Live experiment24

BLIP with Empirical Bayes meta-prior has higher success rate & converges faster

Slide25

Theoretical Results

Unbiasedness property of meta-prior variance estimator:

-

Our

meta-prior variance estimator is

strongly consistent when:

Feature estimates

and their observed variances

are independent

Expectation and variance exist for

,

, and

.

Cumulative regret of

our EB

TS bandit is of order

 

25

Slide26

TakeawaysEmpirical Bayes prior helps, mostly on

low/medium trafficPromising, as low traffic cases are harder to optimize

Effectively

decouples learning rates by imposing a Bayesian hierarchy and learning meta-prior per category of features

Model converges faster with Empirical Bayes’ meta-prior Works for

any feature grouping in a Bayesian settingApplies to contextual and personalization use cases26

Slide27

27Paper is on

arXiv: arxiv.org/abs/2002.01129

Slides available on my website.

Feel free to reach out:

Website:

www.sarehnabi.comLinkedIn: www.linkedin.com/in/sarehnabi Twitter: twitter.com/SarehNabi

Thank you!

Any Questions? Comments?