Clickthroughs for News Search Hongning Wang Anlei Dong Lihong Li Yi Chang Evgeniy Gabrilovich CSUIUC Yahoo Labs Relevance vs Freshness ID: 736709
Download Presentation The PPT/PDF document "Joint Relevance and Freshness Learning F..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Joint Relevance and Freshness Learning From Clickthroughs for News Search
Hongning Wang+, Anlei Dong*, Lihong Li*, Yi Chang*, Evgeniy Gabrilovich*+CS@UIUC *Yahoo! LabsSlide2
Relevance v.s. Freshness
RelevanceTopical relatednessMetric: tf*idf, BM25, Language ModelFreshnessTemporal closenessMetric: age, elapsed timeTrade-offServe for user’s information needSlide3
Freshness is Important for News Search“Apple Company” @ Oct. 4, 2011
Release of iPhone 4SSlide4
Freshness is Important for News Search“Apple Company” @ Oct. 5, 2011
Steve Jobs passed away
Release of
iPhone
4SSlide5
Understand User’s Information NeedUser’s emphasis on relevance/freshness varies
Breaking news queriesPrefer latest news reports – freshness drivenE.g., “apple company”Newsworthy queriesPrefer high coverage and authority news reports – relevance drivenE.g., “bin laden death”Slide6
Understand User’s Information NeedUser’s emphasis on relevance/freshness varies
Breaking news queries
Newsworthy queriesSlide7
Assess User’s Information NeedUnsupervised integration
[Efron 2011, Li 2003]Limited on timestampsEditor’s judgment [Dong 2010, Dai 2011]Expensive for timely annotationInadequate to recover end-user’s information need Slide8
Manipulate Editor’s AnnotationFreshness-demoted relevance
Rule-based hard demotion [Dong 2010]E.g., if the result is somewhat outdated, it should be demoted by one grade (e.g., from excellent to good)Correlation:
0.5764±0.6401Slide9
User’s Judgment on Relevance and Freshness
User’s browsing behaviorFreshness weight=0.8
R=0.39
F=2.34
Y=1.95
R=1.72
F=2.18
Y=2.01
R=2.41
F=1.76
Y=2.09Slide10
Joint Relevance and Freshness Learning
JRFL: (Relevance, Freshness) -> ClickQuery => trade-offURL => relevance/freshness
Click => overall impressionSlide11
Joint Relevance and Freshness Learning
Model formalizationLatentQuery-specificSlide12
Joint Relevance and Freshness Learning
Linear instantiationAssociative propertyRelevance/Freshness model learningQuery model learningSlide13
Coordinate descent for JRFL
Randomly initialize , and set Repeat until convergeUpdate Relevance/Freshness models: Update Query model: Return the final model Joint Relevance and Freshness Learning
Convex programmingSlide14
Temporal FeaturesURL freshness featuresIdentify freshness from content analysisSlide15
Temporal FeaturesQuery freshness featuresCapture latent preferenceSlide16
Experiment ResultsData sets
Two months’ Yahoo! News Search sessionsNormal bucket: top 10 positionsRandom bucket [Li 2011]Randomly shuffled top 4 positionsUnbiased evaluation corpusEditor’s judgment: 1 day’s query logPreference pair selection [Joachims 2005] Click > Skip aboveClick > Skip nextOrdered by Pearson’s valueSlide17
Experiment ResultsData sets
StatisticsSlide18
Analysis of JRFLConvergence
Train/Test sets: 90k/60k preference pairsVarying initial query weight(a) Object Function Value UpdateSlide19
Analysis of JRFLConvergence
Train/Test sets: 90k/60k preference pairsVarying initial query weight(b) Pairwise Error Rate UpdateSlide20
Analysis of JRFLConvergence
Train/Test sets: 90k/60k preference pairsVarying initial query weight(c) Query Weight UpdateSlide21
Analysis of JRFLFeature weight learningSlide22
Analysis of JRFLRelevance and Freshness Learning
Baseline: GBRank trained on Dong et al.’s relevance/freshness annotation setTesting corpus: editor’s one day annotation setUpper boundSlide23
Analysis of JRFLQuery weight analysisSlide24
Analysis of JRFLQuery weight analysisQuery length differs in relevance/freshness driven queries significantlySlide25
Quantitative ComparisonRanking performanceRandom bucket clicksSlide26
Quantitative ComparisonRanking performanceNormal clicksSlide27
Quantitative ComparisonRanking performanceEditorial annotationsSlide28
Qualitative ComparisonCTR distribution revisit
Correlation: 0.7163±0.1673Slide29
ConclusionsJoint Relevance and Freshness Learning
Query-specific preferenceLearning from query logsTemporal featuresFuture workPersonalized retrievalBroad spectral of user’s information needE.g., trustworthiness, opinionSlide30
References[
Efron 2011] M. Efron and G. Golovchinsky. Estimation methods for ranking recent information. In SIGIR, pages 495–504, 2011.[Li 2003] X. Li and W. Croft. Time-based language models. In CIKM, pages 469–475, 2003.[Dong 2010] A. Dong, Y. Chang, Z. Zheng, G. Mishne, J. Bai, R. Zhang, K. Buchner, C. Liao, and F. Diaz. Towards recency ranking in web search. In WSDM, pages 11–20, 2010.[Dai 2011] N. Dai, M. Shokouhi, and B. D. Davison. Learning to rank for freshness and relevance. In SIGIR, pages 95–104, 2011.[Li 2011] L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. In Proceedings of ACM WSDM '11, pages 297–306, 2011.[Joachims
2005] T. Joachims, L. Granka, B. Pan, H.
Hembrooke
, and G. Gay. Accurately interpreting
clickthrough
data as implicit feedback. In SIGIR, pages 154–161, 2005.Slide31
Thank you!
Q&A