with Biased Feedback Thorsten Joachims Adith Swaminathan Tobias Schnabel Department of Computer Science amp Department of Information Science Cornell University LearningtoRank from Clicks ID: 555166
Download Presentation The PPT/PDF document "Unbiased Learning-to-Rank" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Unbiased Learning-to-Rank with Biased Feedback
Thorsten Joachims, Adith Swaminathan, Tobias SchnabelDepartment of Computer Science & Department of Information ScienceCornell UniversitySlide2
Learning-to-Rank from Clicks
Presented
A
B
C
D
EFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Presented
ABCDEFG
ABCDEFG
Click
Click
New Ranker
Learning Algorithm
Query Distribution
Deployed Ranker
Should perform better than
Slide3
New
F
G
D
C
E
ABFG
DCEABPresented
A
B
CDEFGABCDE
FGEvaluating RankingsA
BC
DE
FGABCDEFG
FGDCEA
B
F
GDCEAB
Presented
A
B
CDEFGABCD
EFGClickNew
FGD
C
EABFGDCEAB
Deployed Ranker
New Ranker to Evaluate
124
367Manually LabeledSlide4
Evaluation with Missing Judgments
Loss:
Relevance
labels
This
talk: rank of relevant
documents
Assume:Click implies observed and relevant:
Problem:
No click can mean not relevant OR not observed
Understand observation mechanism Presented AB
CD
E
FGABCDEFG
ClickSlide5
Inverse Propensity Score Estimator
Observation Propensities
Random variable
indicates whether relevance label
for is observed
Inverse
Propensity
Score (IPS) Estimator:Unbiasedness:
Presented
A
1.0
B
0.8
C0.5 D0.2E0.2F0.2G0.1
A
1.0
B0.8C0.5 D0.2E0.2F0.2G0.1
[Horvitz & Thompson, 1952] [Rubin, 1983] [Zadrozny et al., 2003] [Langford, Li, 2009] [Swaminathan & Joachims, 2015]
New RankingSlide6
Inverse Propensity Score Estimator
Observation Propensities
Random variable
indicates whether relevance label
for is observed
Inverse
Propensity
Score (IPS) Estimator:Unbiasedness:
Presented
A
1.0
B
0.8
C
0.5
D
0.2
E
0.2F0.2G0.1
A1.0B0.8C0.5 D
0.2E
0.2F
0.2G0.1
Need to know the propensities only for relevant/clicked docs.[Horvitz & Thompson, 1952] [Rubin, 1983] [Zadrozny et al., 2003] [Langford, Li, 2009] [Swaminathan & Joachims, 2015]
New RankingSlide7
Full-Info Learning-to-Rank
Loss:
Risk
:
Empirical Risk:
Training:
Slide8
ERM for Partial-Information LTR
Unbiased Empirical Risk:
ERM Learning:
Questions
:
How do we optimize this
empirical risk
in a practical learning algorithm?
How do we define and estimate the propensity model
?
Consistent Estimator of True Error
Consistent ERM LearningSlide9
Propensity-Weighted SVM Rank
Data:
Training QP:
Loss Bound:
Query
Clicked
Others
Propensity
Optimizes
convex upper
bound on unbiased
IPS risk
estimate!
[Joachims et al., 2002]Slide10
Propensity-Weighted SVM RankTraining QP:
Risk Bound:
Clicked result
All other results
PropensitySlide11
Position-Based Propensity Model
Model:AssumptionsExamination only depends on rank Q
Clicks reveal relevance if examined
and
otherwise
Propensity
Slide12
Position-Based Propensity ModelModel:
AssumptionsExamination only depends on rank Click reveals relevance if rank is examined
Presented
A
B
C
D
E
F
G
A
B
C
DEFG
[Richardson et al., 2007] [Chuklin
et al., 2015] [Wang et al., 2016]Slide13
Estimating the Propensities
Experiment:Click rate at rank 1:
Intervention:
swap results at rank 1 and rank k
Click rate at rank k:
[Langford et al., 2009;
Wang
et al.,
2016]Slide14
Presented
Presented
A
B
C
D
E
F
G
A
B
C
D
E
F
G
Experiments
Yahoo Web Search
D
ataset
Full-information datasetBinarized relevance labelsGenerate synthetic click data based on
Position-based propensity model with
Baseline “deployed” ranker to generate 33% noisy clicks on irrelevant docs Presented
AB
C
D
EF
G
A
B
CDEFGClick
ClickSlide15
Scaling with Training Set Size
Deployed RankerSlide16
Scaling with Training Set SizeSlide17
Severity of Presentation Bias
Slide18
Increasing Click NoiseSlide19
Misspecified Propensities
Increase bias
Reduce variance
Increase bias
Increase varianceSlide20
Real-World Experiment
Arxiv Full-Text SearchRun intervention experiment to estimate Collect training clicks using production ranker
Train naïve / propensity
SVM-Rank (1000 features)
A/B tests via i
nterleaving
Slide21
Conclusions and FuturePartial-Information
Learning-to-Rank ProblemBetween Batch-Learning-from-Bandit-Feedback and Full-Info Learning-to-RankPositive-only feedbackRelevant for many ranking problems with partial labels
Partial-Information Empirical
Risk Minimization (ERM
)
Unbiased ERM objective despite biased partial-feedback
Propensity Ranking SVM methodFuture ResearchOther loss functions? Other LTR algorithms? More sophisticated propensity estimation and modeling?How to handle new bias-variance trade-off in risk estimator?Software http://www.joachims.org/svm_light/svm_proprank.htmlSlide22
Multi-Label Classification / Ranking
Full Information FeedbackInput: Labels:
Goal: for new x, predict bitvector/ranking/subset
Examples
Document Tagging:
=Doc,
=(
Politics?,Europe?,…)Object Recognition: =Image, =(Cow?,Plane?,…)Search: =query, =(Doc1Rel?,Doc2Rel?,…)
Problem: We almost never get reliable feedback for all labels!
USAPoliticsElection 2016Trump AdminEnvironmentGlobal Warming⁝
Slide23
Partial Feedback: Missing Labels
Labels not Missing Uniformly at Random Covariate ShiftExample: Movie recommendation [Schnabel et al., 2016]
E
xamples
Labels
Examples
Full Information Feedback r
Partial Information Feedback
Partially
revealed
LabelsSlide24
Partial Feedback: Positive-Only
Unclear when “0” means “missing” or “negative”.Example: Tagging, Object Recognition [Jain et al., 2016]
E
xamples
Labels
Examples
Full Information Feedback Y*
Partial Information Feedback
Some positives revealed
Labels
ClickSlide25
What is between BLBF and full-info LTR?
BLBF:No assumptions/knowledge about loss functionOnly observe loss for chosen Exploration policy
provides randomized
with full support
Unbiased Partial-Info LTR:
Loss function is know/assumed
Relevance labels only partially revealedUser behavior provides randomized relevance labels with full supportFull-Info LTR:Loss function is known/assumedRelevance labels known for all documents Slide26
E
xamples
Labels
Slide27
Propensities
Partial-Info Learning-to-RankSetup
is the presented
ranking
is random variable indicating whether
is observed.
is random variable indicating whether is clicked.
Propensity of observing
E
xamples
Labels