/
User Modeling in Search Logs via                  A Non-parametric Bayesian Approach User Modeling in Search Logs via                  A Non-parametric Bayesian Approach

User Modeling in Search Logs via A Non-parametric Bayesian Approach - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
371 views
Uploaded On 2018-11-10

User Modeling in Search Logs via A Non-parametric Bayesian Approach - PPT Presentation

Hongning Wang 1 ChengXiang Zhai 1 Feng Liang 2 1 Department of Computer Science 2 Department of Statistics University of Illinois at UrbanaCha ID: 726557

group 2014 york user 2014 group user york city wsdm query search latent ranking amp modeling bm25 queries document

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "User Modeling in Search Logs via ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

User Modeling in Search Logs via A Non-parametric Bayesian Approach

Hongning Wang1, ChengXiang Zhai1, Feng Liang2 1Department of Computer Science 2Department of Statistics University of Illinois at Urbana-Champaign Urbana, IL 61801 USA {wang296,czhai,liangf}@Illinois.edu

Anlei

Dong, Yi Chang Yahoo! Labs 701 First Avenue, Sunnyvale CA, 94089 USA {

anlei

,

yichang

}@yahoo-inc.comSlide2

Need to understand users’ search intent!

schedule of the games on Sunday

any news event for the Olympics

what is non-parametric Bayes?

what is non-parametric Bayes?

2/26/2014

WSDM'2014 @ New York City

2Slide3

Mining search logs provides the opportunity

UserQueryDocumentsClicks

sochi

winter Olympics

obamacare

affordable

health care plan

super bowl 2014

sochi

winter Olympics

health care reform

Query-centric analysis:

Query categories

[Jansen et al. IPM 2000]

Temporal query

dynamics

[Kulkarni

et al.

WSDM’11]

Isolated analysis

Holistic

view

Click-centric analysis:

Interpreting

clickthrough

data

[

Joachims

, et al.

SIGIR’05,

Agichtein

, et al. SIGIR’06]

Click modeling

[

Dupret

and

Piwowarski

SIGIR’08,

chalell

and Zhang WWW’09]

2/26/2014

WSDM'2014 @ New York City

3Slide4

Giannopoulos et al. CIKM’11

Prior artQuery-cluster-based approachesRanking Specialization for Web SearchBian et al. WWW’10Learning to rank user intentsGiannopoulos et al.

CIKM’11

Divide-and-Conquer strategy

Group queries into clusters

Estimate independent ranking models

for

each cluster

No user-specific information is consideredQueries and clicks are

still

separately analyzed2/26/2014

WSDM'2014 @ New York City4Slide5

Road map

MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014WSDM'2014 @ New York City

5Slide6

Latent user group: a homogenous unit of query and clicks

f

1

Group k

f

2

p(Q)

q

1

q

2

q

3

Modeling of search interest

Modeling of result preferences

Our

Contribution

2/26/2014

WSDM'2014 @ New York City

6Slide7

User: a heterogeneous mixture over the latent user groups

Group

1

Group

2

Group

k

BM25

Nutrition

of fruits

f

2

p(Q)

q

1

q

2

q

3

Stock market

BM25

f

2

q

1

q

2

q

3

p(Q)

Stock market

market report

AAPL

apple

TWTR

apple

orange

banana

nutrition

fruit receipt

fidelity online login

fruit smoothie

FB

GOOG

BM25

PageRank

Our

Contribution

2/26/2014

WSDM'2014 @ New York City

7Slide8

Generation of latent user groups: Dirichlet Process priors [

Ferguson, 1973]…………

f

1

Group k

f

2

p(Q)

q

1

q

2

q

3

Group 1

f

1

f

2

q

1

q

2

q

3

p(Q)

f

1

Group c

f

2

p(Q)

q

1

q

2

q

3

Our

Contribution

2/26/2014

WSDM'2014 @ New York City

8Slide9

Another layer of DP to support infinite mixture of latent user groups [

Teh et al., 2006]Group 1

f

1

f

2

f

1

Group k

f

2

……

Group

1

Group

2

Group

k

Group

1

Group

2

Group

k

Group

1

Group

2

Group

k

p(Q)

q

1

q

2

q

3

q

1

q

2

q

3

p(Q)

……

f

1

Group c

f

2

p(Q)

q

1

q

2

q

3

Our

Contribution

2/26/2014

WSDM'2014 @ New York City

9Slide10

A fully generative model for users’ search behaviors

dpRank model

1. Draw latent user groups from DP:

2. Draw group membership for each user from DP:

3.2 Draw query q

i

for user u accordingly:

3.3 Draw click preferences for q

i

accordingly:

3.1 Draw a latent user group c:

3. To generate a query in user u:

Our

Contribution

2/26/2014

WSDM'2014 @ New York City

10Slide11

Latent variables of interest

andcharacterize the generation of queries in a latent user group depicts users’ result ranking preferences in a latent user group profiles a user’s search intent over the latent user groups

Gibbs sampling for posterior inference

2/26/2014

WSDM'2014 @ New York City

11Slide12

Road map

MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014

WSDM'2014 @ New York City

12Slide13

Data collection

Yahoo! News search logs May to July, 201165 ranking features for each query-URL paire.g., document age, site authority, query matching in titleAggregate URL features for query features [Bian et al. WWW’10]In each user, chronologically, first 60% query for training, rest 40% for testing

2/26/2014

WSDM'2014 @ New York City

13Slide14

Query distribution in latent user groups

breaking news events

entertainment

sports

celebrities

country names

Group

Top Ranked Queries

1

iran

, china,

libya

,

vietnam

,

syria

2

selena

gomez

, lady gaga,

britney

spears,

jennifer

aniston

, taylor swift3

fake tupac story, pbs hackers, alaska earthquake, southwest pilot, arizona wildfires

4joplin missing, apple icloud, sony hackers, google subpoena, ford transmission5

casey anthony trial,

casey anthony jurors,

casey

anthony

, crude oil prices, air

france

flight 447

6

tree of life, game of thrones, sonic the hedgehog, world of

warcraft

,

mtv

awards 2011

7

the titanic, the bachelorette, cars 2, hangover 2, the voice

8

los

angeles lakers, arsenal football, the dark knight rises, transformers 3, manchester united

9miami heat, los angeles lakers, liverpool football club, arsenal football, nfl

lockout10today in history, nascar 2011 schedule, today history, this day in history

2/26/2014

WSDM'2014 @ New York City14Slide15

Click preferences in latent user groups

breaking news events

entertainment

sports

celebrities

country names

document age

query match in title

proximity in title

site authority

“today in history”

Global model

2/26/2014

WSDM'2014 @ New York City

15Slide16

Document ranking

Rank prediction in dpRank BaselinesURSVM: independent SVM for each userGRSVM: a global SVM for all usersTRSVM: Bian et al.’s Topical RankSVMIRSVM: Giannopoulos et al.’s Intent RankSVM

cluster size k determined by cross-validation

posterior samples

2/26/2014

WSDM'2014 @ New York City

16Slide17

Document ranking

Quantitative comparison results

query-centric

user-centric

2/26/2014

WSDM'2014 @ New York City

17Slide18

Computer user similarity based on group membership

dpRank: TRSVM & IRSVM: QuerySim: treat each unique query as a user group

Collaborative document re-ranking

search interest profile

2/26/2014

WSDM'2014 @ New York City

18Slide19

Collaborative document re-ranking

Promote candidate documents byFrom M most similar users to the target user uAccumulate the clicks

default ranker

2/26/2014

WSDM'2014 @ New York City

19

query-centricSlide20

Road map

MotivationOur solution: dpRankExperimental resultsConclusions2/26/2014WSDM'2014 @ New York City

20Slide21

Conclusions

dpRank: a unified modeling approach for users’ search behaviorsLatent user group: a homogenous unit of query and clicksUser: a heterogeneous mixture over the latent user groupsNon-parametric Bayesian: deal with dynamic nature and scale of search logsFuture workIncorporating more types of information about searchersGender, location, age, social networksDependency among the queriesQueries for the same search-task

2/26/2014

WSDM'2014 @ New York City

21Slide22

References

Jansen, B. J., Spink, A., & Saracevic, T. Real life, real users, and real needs: a study and analysis of user queries on the web. Information processing & management, 36(2), 207-227, 2000.Kulkarni, A., Teevan, J., Svore, K. M., & Dumais, S. T. Understanding temporal query dynamics. In WSDM’11, pp. 167-176, 2011.Joachims

, T.,

Granka

, L., Pan, B.,

Hembrooke

, H., & Gay, G.

Accurately interpreting clickthrough data as implicit feedback. In SIGIR’05, pp

. 154-161, 2005.Agichtein, E., Brill, E., Dumais, S., & Ragno, R.

Learning user interaction models for predicting web search result preferences. In SIGIR’06, pp. 3-10, 2006.Dupret

, G. E., & Piwowarski, B.

A user browsing model to predict search engine click data from past observations. In SIGIR’08, pp. 331-338, 2008.Chapelle, O., & Zhang, Y. A dynamic bayesian

network click model for web search ranking. In WWW’09, pp. 1-10, 2009.Bian, J., Li, X., Li, F., Zheng, Z., & Zha

, H. Ranking specialization for web search: a divide-and-conquer approach by using topical RankSVM. In WWW’10, pp.

131-140, 2010.Giannopoulos, G., Brefeld

, U., Dalamagas, T., & Sellis, T. Learning to rank user intent. In CIKM'11, pp. 195-200, 2011Ferguson, T. S. A Bayesian analysis of some nonparametric problems. The annals of statistics,

209-230, 1973.Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei., Hierarchical dirichlet

processes. Journal of the American, Statistical Association, 101(476):1566-1581, 2006.2/26/2014

WSDM'2014 @ New York City

22Slide23

Thank you!

Q&A23

dpRank

: a unified modeling approach for users’ search behaviors

2/26/2014

WSDM'2014 @ New York CitySlide24

Wanted: a unified modeling approach

Healthcare

insurance plan

obamacare

medicare

health reform

Beyonce

Rihanna

Shakira

Lady Gaga

pop music

low cost insurance

Grammy Awards

BM25

PageRank

Modeling of search interest

Modeling of result preferences

2/26/2014

WSDM'2014 @ New York City

24Slide25

Wanted: a model capturing heterogeneity in users’ search behaviors

BM25

PageRank

stock market

PageRank

BM25

nutrition of fruits

Stock market

market report

AAPL

apple

TWTR

apple

orange

banana

nutrition

fruit receipt

fidelity online login

fruit smoothie

FB

GOOG

BM25

PageRank

market report

stock market

apple

p(Q)

apple

fruit receipt

nutrition

p(Q)

2/26/2014

WSDM'2014 @ New York City

25Slide26

Search log mining provides an opportunity

UserQueryDocumentsClickssochi winter Olympics

obamacare

affordable

health care plan

super bowl 2014

sochi

winter Olympics

health care reform

Query-centric analysis:

Click-centric analysis:

Temporal query dynamics, Kulkarni et al. WSDM’11

Query categories, Jansen et al.

IPM

2000

Interpreting

clickthrough

data,

Joachims

, et al. SIGIR’05

Isolated analysis

Holistic

view

2/26/2014

WSDM'2014 @ New York City

26Slide27

Both queries and clicks reflect an individual user’s search intent

Healthcare

insurance plan

obamacare

medicare

health reform

Beyonce

Rihanna

Shakira

Lady Gaga

pop music

low cost insurance

Grammy Awards

BM25

PageRank

Healthcare

health insurance

obamacare

medicare

health policy

super bowl

NASCAR

Sochi

Lionel

Messi

sports events

affordable insurance

NBA all star

BM25

PageRank

2/26/2014

WSDM'2014 @ New York City

27Slide28

Our solution

Latent user group: a homogenous unit of query and clicksGroup 1

f

1

f

2

f

1

Group k

f

2

q

1

q

2

q

3

p(Q)

p(Q)

q

1

q

2

q

3

Modeling of search interest

Modeling of result preferences

2/26/2014

WSDM'2014 @ New York City

28Slide29

Gibbs sampling for posterior inference

SamplingLatent user group assignment of qi in u Sampling Conjugacy leads to analytical solutions for andMetropolis hasting sampling for

current group assignment in u

data generation likelihood

global group proportion

2/26/2014

WSDM'2014 @ New York City

29Slide30

Discussion

User-centric joint modeling of search behaviorsdpRank reveals information at aggregated level, i.e., the shared latent user groupse.g., describes users common result ranking preference in group kindividual level, i.e., user-specific mixing proportions profiles an individual user’s search intent

current group assignment in u

data generation likelihood

global group proportion

Query-cluster based solution:

1) => 2)

2/26/2014

WSDM'2014 @ New York City

30Slide31

Document ranking II

Output as additional ranking features for LambdaMARTdpRank: TRSVM & IRSVM:

2/26/2014

WSDM'2014 @ New York City

31Slide32

Document ranking II

Feature importance in LambdaMART

2/26/2014

WSDM'2014 @ New York City

32Slide33

Collaborative query recommendation

Promote candidate queries byFrom M most similar users to the target user uSelect top 10 queries according to

2/26/2014

WSDM'2014 @ New York City

33Slide34

Collaborative query recommendation

Promote candidate queries byFrom M most similar users to the target user uSelect top 10 queries according to

2/26/2014

WSDM'2014 @ New York City

34