Work started while intern at Amazon Bayesian MetaPrior Learning Using Empirical Bayes Paper in collaboration with Houssam Nassif Joseph Hong Hamed Mamani Guido Imbens Sequential decision making under uncertainty ID: 932397
Download Presentation The PPT/PDF document "Sareh Nabi University of Washington – ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Sareh NabiUniversity of Washington – Microsoft Dynamics Work started while intern at Amazon
Bayesian Meta-Prior Learning Using Empirical Bayes
Paper in collaboration with:
Houssam
Nassif, Joseph Hong, Hamed
Mamani
, Guido
Imbens
Slide2Sequential decision making under uncertainty
Slide3Exploration vs Exploitation Trade-off
3
Explore/Learn
Learn
more about what is good and badExploit/Earn Make the best use of what we believe is good
Slide4Multi-armed Bandit (MAB)
4
Originally proposed by Robbins (1952).
Reward under
arm i follows a Bernoulli dist. with unknown mean reward
Goal: Pull arm sequentially to maximize total reward.
Slide5MAB Algorithms …Algorithms that solve MAB problems
Index policies: Gittin's
index
Frequentist approach: Upper Confidence Bound type
algsBayesian approach: challenge of non-informative prior
5
Slide6What we address …
Propose a framework to learn meta-prior from initial data using empirical Bayes Decouple learning rates of model parameters by grouping similar ones in the same category and learning their meta-prior separately
Next:
Overview of our meta-prior framework and its application to our ad layout optimization problem (in Amazon production system) and classification problem (on a public dataset). Lastly, review our theoretical results.
6
Slide77Biking competition
Join now!
Check other teams
Stop biking pointlessly, join team Circle!
Year-round carousel rides!
Unlimited laps!All you can eat Pi!No corners to hide!Hurry, don’t linger around!
Slide88Multiple-slot template
Join now!
Check other teams
Stop biking pointlessly, join team Circle!
Year-round carousel rides!
Unlimited laps!All you can eat Pi!No corners to hide!Hurry, don’t linger around!
Slot
Slide99Template Combinatorics: 48 layouts
Join now!
Check other teams
Stop biking pointlessly, join team Circle!
Best way to bicycle!
We cycle and re-cycle!No point in not joining!
x2
x2
x2
x2
x3
Slide10FeaturizationFirst-order features,
X1: variant valuesE.g. “image1”, “title2”, “color1”
Second-order features,
X
2: variant combinations/interactions E.g. “image1 AND title2”, “image1 AND color1”Concatenate for final feature vector representation: X = [X
1, X2]10
Slide11Generalized Linear Models (GLM)
Assign one weight wi per feature
x
i
Let r: reward (e.g. click), g: link function
intercept + 1
st
order + 2
nd
order effect
11
Slide12Bayesian GLM
At time t, each weight w
i
is represented by a
distribution:
Eg:
12
+
=
Bayesian GLM weight updates
wi,t
update
:
Starting prior: Non-informative standard normal
Bayesian GLM bandits (Thompson Sampling): Sample W, then
)
13
w
i,t
prior
+ data =
w
i,
t+1
posterior
Slide14Empirical Bayes Approach14
Slide15Impose Bayesian hierarchy
Let
: true mean of weight
: GLM-estimated mean and variance of weight
Critical model assumptions:
Group features into
C
k=2 categories
(1st order, 2nd order)Each category has a hierarchical
meta-prior
True mean
Observed mean
15
Slide16Model assumptions
16
,
…
Empirical prior helps, eve with one batch
Type equation here.
Meta prior
Meta prior
, …
Empirical prior estimation
Using variance decomposition & our assumptions:
Substituting
empirical mean and variance estimators,
17
Slide18Empirical Bayes (EB) algorithm
Start Bayesian GLM with non-informative prior
After time
t
, compute
for each categoryRestart model with per-category empirical priors
Retrain using data from elapsed t
timestepsContinue using GLM as normalAlgorithm used the same data twice, to compute the empirical prior and to train the modelWorks with online or batch settings
18
Slide19Experiments and Results19
Slide20Experimental setup
Used Bayesian Linear Probit (BLIP) as our Bayesian GLMGrouped features as 1
st
and 2
nd order featuresbatch updatesSimulations on classification task (UCI income prediction)Live experiments in Amazon production system
had little effect, we focus on
20
Slide21When to compute empirical prior21
Empirical prior helps, even with one batch
Longer reset time helps more
Slide22Lasting effect on small batches22
BLIPBayes outperforms both other methods
BLIPBayes can be especially valuable for small batch training
Slide23Empirical outperforms random
23
Empirical Bayes did best
Erring towards a large prior variance is more easily overcome by data than erring towards a small prior variance
Slide24Live experiment24
BLIP with Empirical Bayes meta-prior has higher success rate & converges faster
Slide25Theoretical Results
Unbiasedness property of meta-prior variance estimator:
-
Our
meta-prior variance estimator is
strongly consistent when:
Feature estimates
and their observed variances
are independent
Expectation and variance exist for
,
, and
.
Cumulative regret of
our EB
TS bandit is of order
25
Slide26TakeawaysEmpirical Bayes prior helps, mostly on
low/medium trafficPromising, as low traffic cases are harder to optimize
Effectively
decouples learning rates by imposing a Bayesian hierarchy and learning meta-prior per category of features
Model converges faster with Empirical Bayes’ meta-prior Works for
any feature grouping in a Bayesian settingApplies to contextual and personalization use cases26
Slide2727Paper is on
arXiv: arxiv.org/abs/2002.01129
Slides available on my website.
Feel free to reach out:
Website:
www.sarehnabi.comLinkedIn: www.linkedin.com/in/sarehnabi Twitter: twitter.com/SarehNabi
Thank you!
Any Questions? Comments?