Hongning Wang 1 ChengXiang Zhai 1 Feng Liang 2 1 Department of Computer Science 2 Department of Statistics University of Illinois at UrbanaCha ID: 726557
Download Presentation The PPT/PDF document "User Modeling in Search Logs via ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
User Modeling in Search Logs via A Non-parametric Bayesian Approach
Hongning Wang1, ChengXiang Zhai1, Feng Liang2 1Department of Computer Science 2Department of Statistics University of Illinois at Urbana-Champaign Urbana, IL 61801 USA {wang296,czhai,liangf}@Illinois.edu
Anlei
Dong, Yi Chang Yahoo! Labs 701 First Avenue, Sunnyvale CA, 94089 USA {
anlei
,
yichang
}@yahoo-inc.comSlide2
Need to understand users’ search intent!
schedule of the games on Sunday
any news event for the Olympics
what is non-parametric Bayes?
what is non-parametric Bayes?
2/26/2014
WSDM'2014 @ New York City
2Slide3
Mining search logs provides the opportunity
UserQueryDocumentsClicks
sochi
winter Olympics
obamacare
affordable
health care plan
super bowl 2014
sochi
winter Olympics
health care reform
Query-centric analysis:
Query categories
[Jansen et al. IPM 2000]
Temporal query
dynamics
[Kulkarni
et al.
WSDM’11]
Isolated analysis
Holistic
view
Click-centric analysis:
Interpreting
clickthrough
data
[
Joachims
, et al.
SIGIR’05,
Agichtein
, et al. SIGIR’06]
Click modeling
[
Dupret
and
Piwowarski
SIGIR’08,
chalell
and Zhang WWW’09]
2/26/2014
WSDM'2014 @ New York City
3Slide4
Giannopoulos et al. CIKM’11
Prior artQuery-cluster-based approachesRanking Specialization for Web SearchBian et al. WWW’10Learning to rank user intentsGiannopoulos et al.
CIKM’11
Divide-and-Conquer strategy
Group queries into clusters
Estimate independent ranking models
for
each cluster
No user-specific information is consideredQueries and clicks are
still
separately analyzed2/26/2014
WSDM'2014 @ New York City4Slide5
Road map
MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014WSDM'2014 @ New York City
5Slide6
Latent user group: a homogenous unit of query and clicks
f
1
Group k
f
2
p(Q)
q
1
q
2
q
3
Modeling of search interest
Modeling of result preferences
Our
Contribution
2/26/2014
WSDM'2014 @ New York City
6Slide7
User: a heterogeneous mixture over the latent user groups
…
Group
1
Group
2
Group
k
BM25
Nutrition
of fruits
f
2
p(Q)
q
1
q
2
q
3
Stock market
BM25
f
2
q
1
q
2
q
3
p(Q)
Stock market
market report
AAPL
apple
TWTR
apple
orange
banana
nutrition
fruit receipt
fidelity online login
fruit smoothie
FB
GOOG
BM25
PageRank
Our
Contribution
2/26/2014
WSDM'2014 @ New York City
7Slide8
Generation of latent user groups: Dirichlet Process priors [
Ferguson, 1973]…………
f
1
Group k
f
2
p(Q)
q
1
q
2
q
3
Group 1
f
1
f
2
q
1
q
2
q
3
p(Q)
f
1
Group c
f
2
p(Q)
q
1
q
2
q
3
Our
Contribution
2/26/2014
WSDM'2014 @ New York City
8Slide9
Another layer of DP to support infinite mixture of latent user groups [
Teh et al., 2006]Group 1
f
1
f
2
f
1
Group k
f
2
……
Group
1
Group
2
Group
k
Group
1
Group
2
Group
k
Group
1
Group
2
Group
k
p(Q)
q
1
q
2
q
3
q
1
q
2
q
3
p(Q)
……
f
1
Group c
f
2
p(Q)
q
1
q
2
q
3
…
…
…
Our
Contribution
2/26/2014
WSDM'2014 @ New York City
9Slide10
A fully generative model for users’ search behaviors
dpRank model
1. Draw latent user groups from DP:
2. Draw group membership for each user from DP:
3.2 Draw query q
i
for user u accordingly:
3.3 Draw click preferences for q
i
accordingly:
3.1 Draw a latent user group c:
3. To generate a query in user u:
Our
Contribution
2/26/2014
WSDM'2014 @ New York City
10Slide11
Latent variables of interest
andcharacterize the generation of queries in a latent user group depicts users’ result ranking preferences in a latent user group profiles a user’s search intent over the latent user groups
Gibbs sampling for posterior inference
2/26/2014
WSDM'2014 @ New York City
11Slide12
Road map
MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014
WSDM'2014 @ New York City
12Slide13
Data collection
Yahoo! News search logs May to July, 201165 ranking features for each query-URL paire.g., document age, site authority, query matching in titleAggregate URL features for query features [Bian et al. WWW’10]In each user, chronologically, first 60% query for training, rest 40% for testing
2/26/2014
WSDM'2014 @ New York City
13Slide14
Query distribution in latent user groups
breaking news events
entertainment
sports
celebrities
country names
Group
Top Ranked Queries
1
iran
, china,
libya
,
vietnam
,
syria
2
selena
gomez
, lady gaga,
britney
spears,
jennifer
aniston
, taylor swift3
fake tupac story, pbs hackers, alaska earthquake, southwest pilot, arizona wildfires
4joplin missing, apple icloud, sony hackers, google subpoena, ford transmission5
casey anthony trial,
casey anthony jurors,
casey
anthony
, crude oil prices, air
france
flight 447
6
tree of life, game of thrones, sonic the hedgehog, world of
warcraft
,
mtv
awards 2011
7
the titanic, the bachelorette, cars 2, hangover 2, the voice
8
los
angeles lakers, arsenal football, the dark knight rises, transformers 3, manchester united
9miami heat, los angeles lakers, liverpool football club, arsenal football, nfl
lockout10today in history, nascar 2011 schedule, today history, this day in history
2/26/2014
WSDM'2014 @ New York City14Slide15
Click preferences in latent user groups
breaking news events
entertainment
sports
celebrities
country names
document age
query match in title
proximity in title
site authority
“today in history”
Global model
2/26/2014
WSDM'2014 @ New York City
15Slide16
Document ranking
Rank prediction in dpRank BaselinesURSVM: independent SVM for each userGRSVM: a global SVM for all usersTRSVM: Bian et al.’s Topical RankSVMIRSVM: Giannopoulos et al.’s Intent RankSVM
cluster size k determined by cross-validation
posterior samples
2/26/2014
WSDM'2014 @ New York City
16Slide17
Document ranking
Quantitative comparison results
query-centric
user-centric
2/26/2014
WSDM'2014 @ New York City
17Slide18
Computer user similarity based on group membership
dpRank: TRSVM & IRSVM: QuerySim: treat each unique query as a user group
Collaborative document re-ranking
search interest profile
2/26/2014
WSDM'2014 @ New York City
18Slide19
Collaborative document re-ranking
Promote candidate documents byFrom M most similar users to the target user uAccumulate the clicks
default ranker
2/26/2014
WSDM'2014 @ New York City
19
query-centricSlide20
Road map
MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014WSDM'2014 @ New York City
20Slide21
Conclusions
dpRank: a unified modeling approach for users’ search behaviorsLatent user group: a homogenous unit of query and clicksUser: a heterogeneous mixture over the latent user groupsNon-parametric Bayesian: deal with dynamic nature and scale of search logsFuture workIncorporating more types of information about searchersGender, location, age, social networksDependency among the queriesQueries for the same search-task
2/26/2014
WSDM'2014 @ New York City
21Slide22
References
Jansen, B. J., Spink, A., & Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management, 36(2), 207-227, 2000.Kulkarni, A., Teevan, J., Svore, K. M., & Dumais, S. T. Understanding temporal query dynamics. In WSDM’11, pp. 167-176, 2011.Joachims
, T.,
Granka
, L., Pan, B.,
Hembrooke
, H., & Gay, G.
Accurately interpreting clickthrough data as implicit feedback. In SIGIR’05, pp
. 154-161, 2005.Agichtein, E., Brill, E., Dumais, S., & Ragno, R.
Learning user interaction models for predicting web search result preferences. In SIGIR’06, pp. 3-10, 2006.Dupret
, G. E., & Piwowarski, B.
A user browsing model to predict search engine click data from past observations. In SIGIR’08, pp. 331-338, 2008.Chapelle, O., & Zhang, Y. A dynamic bayesian
network click model for web search ranking. In WWW’09, pp. 1-10, 2009.Bian, J., Li, X., Li, F., Zheng, Z., & Zha
, H. Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM. In WWW’10, pp.
131-140, 2010.Giannopoulos, G., Brefeld
, U., Dalamagas, T., & Sellis, T. Learning to rank user intent. In CIKM'11, pp. 195-200, 2011Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The annals of statistics,
209-230, 1973.Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei., Hierarchical dirichlet
processes. Journal of the American, Statistical Association, 101(476):1566-1581, 2006.2/26/2014
WSDM'2014 @ New York City
22Slide23
Thank you!
Q&A23
dpRank
: a unified modeling approach for users’ search behaviors
2/26/2014
WSDM'2014 @ New York CitySlide24
Wanted: a unified modeling approach
Healthcare
insurance plan
obamacare
medicare
health reform
Beyonce
Rihanna
Shakira
Lady Gaga
pop music
low cost insurance
Grammy Awards
BM25
PageRank
Modeling of search interest
Modeling of result preferences
2/26/2014
WSDM'2014 @ New York City
24Slide25
Wanted: a model capturing heterogeneity in users’ search behaviors
BM25
PageRank
stock market
PageRank
BM25
nutrition of fruits
Stock market
market report
AAPL
apple
TWTR
apple
orange
banana
nutrition
fruit receipt
fidelity online login
fruit smoothie
FB
GOOG
BM25
PageRank
market report
stock market
apple
p(Q)
apple
fruit receipt
nutrition
p(Q)
2/26/2014
WSDM'2014 @ New York City
25Slide26
Search log mining provides an opportunity
UserQueryDocumentsClickssochi winter Olympics
obamacare
affordable
health care plan
super bowl 2014
sochi
winter Olympics
health care reform
Query-centric analysis:
Click-centric analysis:
Temporal query dynamics, Kulkarni et al. WSDM’11
Query categories, Jansen et al.
IPM
2000
Interpreting
clickthrough
data,
Joachims
, et al. SIGIR’05
Isolated analysis
Holistic
view
2/26/2014
WSDM'2014 @ New York City
26Slide27
Both queries and clicks reflect an individual user’s search intent
Healthcare
insurance plan
obamacare
medicare
health reform
Beyonce
Rihanna
Shakira
Lady Gaga
pop music
low cost insurance
Grammy Awards
BM25
PageRank
Healthcare
health insurance
obamacare
medicare
health policy
super bowl
NASCAR
Sochi
Lionel
Messi
sports events
affordable insurance
NBA all star
BM25
PageRank
2/26/2014
WSDM'2014 @ New York City
27Slide28
Our solution
Latent user group: a homogenous unit of query and clicksGroup 1
f
1
f
2
f
1
Group k
f
2
…
q
1
q
2
q
3
p(Q)
p(Q)
q
1
q
2
q
3
Modeling of search interest
Modeling of result preferences
2/26/2014
WSDM'2014 @ New York City
28Slide29
Gibbs sampling for posterior inference
SamplingLatent user group assignment of qi in u Sampling Conjugacy leads to analytical solutions for andMetropolis hasting sampling for
current group assignment in u
data generation likelihood
global group proportion
2/26/2014
WSDM'2014 @ New York City
29Slide30
Discussion
User-centric joint modeling of search behaviorsdpRank reveals information at aggregated level, i.e., the shared latent user groupse.g., describes users common result ranking preference in group kindividual level, i.e., user-specific mixing proportions profiles an individual user’s search intent
current group assignment in u
data generation likelihood
global group proportion
Query-cluster based solution:
1) => 2)
2/26/2014
WSDM'2014 @ New York City
30Slide31
Document ranking II
Output as additional ranking features for LambdaMARTdpRank: TRSVM & IRSVM:
2/26/2014
WSDM'2014 @ New York City
31Slide32
Document ranking II
Feature importance in LambdaMART
2/26/2014
WSDM'2014 @ New York City
32Slide33
Collaborative query recommendation
Promote candidate queries byFrom M most similar users to the target user uSelect top 10 queries according to
2/26/2014
WSDM'2014 @ New York City
33Slide34
Collaborative query recommendation
Promote candidate queries byFrom M most similar users to the target user uSelect top 10 queries according to
2/26/2014
WSDM'2014 @ New York City
34