/
Using Large Scale Log Analysis to Understand Human Behavior Using Large Scale Log Analysis to Understand Human Behavior

Using Large Scale Log Analysis to Understand Human Behavior - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
356 views
Uploaded On 2018-10-21

Using Large Scale Log Analysis to Understand Human Behavior - PPT Presentation

Jaime Teevan Microsoft Reseach JITP 2011 David Foster Wallace Mark Twain Cowards die many times before their deaths Annotated by Nelson Mandela I have discovered a truly marvelous proof ID: 691534

web data query 142039 data web 142039 query search jitp logs log 2011 2006 behavioral teevan queries seattle time

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Using Large Scale Log Analysis to Unders..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Using Large Scale Log Analysis to Understand Human Behavior

Jaime Teevan, Microsoft Reseach

JITP 2011Slide2

David Foster Wallace

Mark Twain

Cowards die many times

before their deaths.

Annotated by Nelson Mandela

I

have discovered a truly marvelous proof

...

which

this margin is too narrow to

contain.

Pierre de Fermat

(1637)

Students prefer used textbooks that are annotated.

[Marshall 1998]Slide3

Digital Marginalia

Do we lose marginalia with digital documents?Internet exposes information experiencesMeta-data, annotations, relationships

Large-scale information usage dataChange in focusWith marginalia, interest is in the individualNow we can look at experiences in the aggregateSlide4
Slide5

http://hint.fm/seerSlide6

Defining

Behavioral Log DataBehavioral log data are:

Traces of human behavior, seen through a sensorActual, real-world behaviorNot recalled behavior or subjective impressionLarge-scale, real-timeBehavioral log data are not:Non-behavioral sources of large-scale dataCollected data (e.g., poll

data, surveys, census data)Crowdsourced data (e.g., Mechanical Turk)Slide7

Real-World, Large-Scale, Real-Time

Private behavior is exposedExample: Porn queries, medical queriesRare behavior is common

Example: Observe 500 million queries a dayInterested in behavior that occurs 0.002% of the timeStill observe the behavior 10 thousand times a day!New behavior appears immediatelyExample: Google Flu TrendsSlide8

Overview

How behavioral log data can be used

Sources of behavioral log dataChallenges with privacy and data sharingExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systemsWhat

behavioral logs cannot revealHow to address limitationsSlide9

Practical Uses for Behavioral Data

Behavioral data to improve Web searchOffline log analysisExample: Re-finding common, so add history supportOnline log-based experiments

Example: Interleave different rankings to find best algorithmLog-based functionalityExample: Boost clicked results in a search result listBehavioral data on the desktopGoal: Allocate editorial resources to create Help docsHow to do so without knowing what people search for?Slide10

Societal Uses of Behavioral Data

Understand people’s information needsUnderstand what people talk aboutImpact public policy? (E.g., DonorsChoose.org)

[

Baeza

Yates et al. 2007]Slide11

Generalizing About Behavior

Button clicksStructured answers

Information useInformation needsWhat people think

jitp

2011

Feature use

Human behaviorSlide12

Personal Use of Behavioral Data

Individuals now have a lot of behavioral data

Introspection of personal data popularMy Year in StatusStatus StatisticsExpect to see more

As compared to othersFor a purposeSlide13

Overview

Behavioral logs give practical, societal, personal insight

Sources of behavioral log dataChallenges with privacy and data sharingExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systems

What behavioral cannot revealHow to address limitationsSlide14

Web Service Logs

Example sourcesSearch enginesCommercial websitesTypes of informationBehavior: Queries, clicks

Content: Results, productsExample analysisQuery ambiguityTeevan, Dumais & Liebling. To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent. SIGIR 2008

Integral Theory and Practice

Parenting

IT & PoliticsSlide15

Public Web Service Content

Example sourcesSocial network sitesWiki change logsTypes of informationPublic content

Dependent on serviceExample analysisTwitter topic modelsRamage, Dumais & Liebling. Characterizing microblogging using latent topic models. ICWSM 2010 j

http://twahpic.cloudapp.netSlide16

Web Browser Logs

Example sourcesProxiesToolbarTypes of informationBehavior: URL visit

Content: Settings, pagesExample analysisDiff-IE (http://bit.ly/DiffIE)Teevan, Dumais & Liebling.

A Longitudinal Study of How Highlighting Web Content Change Affects .. Interactions. CHI 2010Slide17

Web Browser Logs

Example sourcesProxiesToolbarTypes of informationBehavior: URL visit

Content: Settings, pagesExample analysisWebpage revisitationAdar, Teevan & Dumais. Large Scale Analysis of Web

Revisitation Patterns. CHI 2008Slide18

Client-Side Logs

Example sourcesClient applicationOperating systemTypes of informationWeb client interactions

Other interactions – rich!Example analysisStuff I’ve SeenDumais, Cutrell, Cadiz, Jancke, Sarin & Robbins. Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003 Slide19

Types of Logs Rich and Varied

Web

services

Search enginesCommerce sitesPublic Web servicesSocial network sitesWiki change logsWeb BrowsersProxiesToolbars or plug-insClient applications

Interactions

Posts, edits

Queries, clicks

URL visits

System interactions

ContextResultsAdsWeb pages

shown

Sources of Log DataTypes of Information LoggedSlide20

Public Sources of Behavioral Logs

Public Web service contentTwitter, Facebook, Digg, WikipediaAt JITP:

InfoExtractor, Facebook Harvester, scraping toolsResearch efforts to create logsAt JITP: Roxy, a research proxyLemur Community Query Log Projecthttp://lemurstudy.cs.umass.edu/

1 year of data collection = 6 seconds of Google logsPublicly released private logsDonorsChoose.orghttp://developer.donorschoose.org/the-data

Enron

corpus, AOL

search logs, Netflix

ratingsSlide21

Example: AOL Search Dataset

August 4, 2006: Logs released to academic community3 months, 650 thousand users, 20 million queriesLogs contain anonymized

User IDsAugust 7, 2006: AOL pulled the files, but already mirroredAugust 9, 2006: New York Times identified Thelma Arnold“A Face Is Exposed for AOL Searcher No. 4417749”Queries for businesses, services

in Lilburn, GA (pop. 11k)Queries for Jarrett Arnold (and others of the Arnold clan)NYT contacted all 14 people in Lilburn with Arnold surnameWhen contacted, Thelma Arnold acknowledged her queries

August 21, 2006: 2 AOL employees fired, CTO resigned

September, 2006: Class action lawsuit filed against AOL

AnonID

Query

QueryTime

ItemRank

ClickURL

---------- --------- --------------- ------------- ------------1234567 jitp 2006-04-04 18:18:18 1 http://www.jitp.net/1234567 jipt

submission process 2006-04-04 18:18:18 3 http://www.jitp.net/m_mscript.php?p=21234567

computational social scinece 2006-04-24 09:19:321234567 computational social science 2006-04-24 09:20:04 2 http://socialcomplexity.gmu.edu/phd.php

1234567 seattle restaurants 2006-04-24 09:25:50 2 http://seattletimes.nwsource.com/rests1234567

perlman montreal 2006-04-24 10:15:14 4 http://oldwww.acm.org/perlman/guide.html1234567 jitp

2006

notification 2006-05-20 13:13:13…Slide22

Example: AOL Search Dataset

Other well known AOL usersUser 927 how to kill your wife

User 711391 i love alaskahttp://www.minimovies.org/documentaires/view/ilovealaskaAnonymous IDs do not make logs anonymous

Contain directly identifiable informationNames, phone numbers, credit cards, social security numbersContain indirectly identifiable informationExample: Thelma’s queriesBirthdate, gender, zip code identifies 87% of AmericansSlide23

Example: Netflix Challenge

October 2, 2006: Netflix announces contestPredict people’s ratings for a $1 million dollar prize100 million ratings, 480k users, 17k movies

Very careful with anonymity post-AOLMay 18, 2008: Data de-anonymized Paper published by Narayanan & ShmatikovUses background knowledge from IMDBRobust to perturbations in dataDecember 17, 2009: Doe v. Netflix

March 12, 2010: Netflix cancels second competition

Ratings

1:

[

Movie 1 of 17770]

12, 3,

2006-04-18 [

CustomerID

, Rating, Date]1234, 5 , 2003-07-08 [CustomerID, Rating, Date]

2468, 1, 2005-11-12 [CustomerID, Rating, Date]…

Movie Titles…

10120, 1982, “Bladerunner”17690, 2007, “The Queen”…

A

ll

customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy

policy. . . Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because

only a small sample was included

(less than one tenth of our complete dataset) and that

data was subject to perturbation

.Slide24

Overview

Behavioral logs give practical, societal, personal insight

Sources include Web services, browsers, client appsPublic sources limited due to privacy concernsExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systems

What behavioral logs cannot revealHow to address limitationsSlide25

Query

Time

Userjitp 201110:41 am

5/15/11142039social science10:44 am

5/15/11

142039

computational social science

10:56 am

5/15/11

142039

jitp

201111:21 am

5/15/11659327crowne plaza seattle11:59 am 5/15/11318222

restaurants seattle12:01 pm 5/15/11

318222pikes market restaurants12:17 pm 5/15/11318222

stuart shulman12:18 pm 5/15/11142039

daytrips in seattle, wa

1:30 pm 5/15/11

554320jitp 2011

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039Slide26

Query

Time

User

jitp

2011

10:41 am

5/15/11

142039

social science

10:44 am

5/15/11

142039

teen

sex

10:56 am 5/15/11142039jitp

201111:21 am 5/15/11659327

crowne plaza seattle11:59 am 5/15/11318222

restaurants seattle

12:01 pm 5/15/11318222pikes market restaurants

12:17 pm 5/15/11

318222

stuart

shulman

12:18 pm

5/15/11

142039

daytrips in

seattle

,

wa

1:30 pm

5/15/11

554320

sex with animals

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039

cheap digital camera

12:17 pm 5/15/11

554320

cheap digital camera

12:18 pm

5/15/11

554320

cheap digital camera

12:19 pm

5/15/11

554320

社会科

11:59 am 11/3/23

12:01 pm

11/3/23

Porn

Language

Spam

System errors

Data cleaning

pragmatics

Significant part of data analysis

Ensure cleaning is appropriate

Keep track of the

cleaning process

Keep the original data around

Example:

ClimateGateSlide27

Query

Time

Userjitp 201110:41 am

5/15/11142039social science10:44 am

5/15/11

142039

computational social science

10:56 am

5/15/11

142039

jitp

201111:21 am

5/15/11659327crowne plaza seattle11:59 am 5/15/11318222

restaurants seattle12:01 pm 5/15/11

318222pikes market restaurants12:17 pm 5/15/11318222

stuart shulman12:18 pm 5/15/11142039

daytrips in seattle, wa

1:30 pm 5/15/11

554320jitp 2011

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039Slide28

Query

Time

Userjitp 201110:41 am

5/15/11142039social science10:44 am

5/15/11

142039

computational social science

10:56 am

5/15/11

142039

jitp

2011

11:21 am 5/15/11659327crowne plaza seattle11:59 am 5/15/11

318222restaurants seattle

12:01 pm 5/15/11318222pikes market restaurants12:17 pm 5/15/11

318222stuart shulman12:18 pm 5/15/11

142039daytrips in seattle,

wa

1:30 pm 5/15/11554320jitp

2011

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039

Query typologySlide29

Query

Time

Userjitp 2011

10:41 am 5/15/11142039social science

10:44 am

5/15/11

142039

computational social science

10:56 am

5/15/11

142039

jitp 2011

11:21 am 5/15/11659327crowne plaza seattle

11:59 am 5/15/11318222restaurants seattle

12:01 pm 5/15/11318222pikes market restaurants12:17 pm 5/15/11

318222stuart shulman12:18 pm 5/15/11

142039daytrips in seattle

, wa

1:30 pm 5/15/11554320

jitp

2011

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039

Query typology

Query behaviorSlide30

Query

Time

User

jitp

2011

10:41 am

5/15/11

142039

social science

10:44 am

5/15/11

142039

computational social science

10:56 am

5/15/11142039jitp 2011

11:21 am 5/15/11659327crowne plaza

seattle11:59 am 5/15/11318222restaurants

seattle

12:01 pm 5/15/11318222pikes market restaurants

12:17 pm 5/15/11

318222

stuart

shulman

12:18 pm

5/15/11

142039

daytrips in

seattle

,

wa

1:30 pm

5/15/11

554320

jitp

2011

1:30 pm

5/15/11

659327

jitp

program

2:32 pm

5/15/11

435451

jitp2011.org

2:42 pm

5/15/11

435451

computational social science

4:56 pm

5/15/11

142039

jitp

2011

5:02 pm

5/15/11

312055

xxx clubs

in

seattle

10:14 pm

5/15/11

142039

sex videos

1:49 am

5/16/11

142039

Query typology

Query behavior

Long term trends

Uses of Analysis

Ranking

E.g., precision

System design

E.g., caching

User interface

E.g., history

Test set development

Complementary researchSlide31

Things Observed in Query Logs

Summary measuresQuery frequencyQuery lengthAnalysis of query intentQuery types and topics

Temporal featuresSession lengthCommon re-formulationsClick behaviorRelevant results for queryQueries that lead to clicks

[Joachims 2002]

Sessions 2.20 queries long

[Silverstein et al. 1999]

[Lau and Horvitz, 1999]

Navigational, Informational, Transactional

[Broder 2002]

2.35 terms

[Jansen et al. 1998]

Queries appear 3.97 times

[Silverstein et al. 1999]Slide32

Surprises About Query Log Data

From early log analysis

Examples: Jansen et al. 2000, Broder 1998Queries are not 7 or 8 words longAdvanced operators not used or “misused”Nobody used relevance feedbackLots of people search for sex

Navigation behavior commonPrior experience was with library searchSlide33

Surprises About

Microblog Search?Slide34

Ordered

by time

Ordered by relevance

8 new tweets

Surprises About

Microblog

Search?Slide35

Ordered

by time

Ordered by relevance

8 new tweets

Surprises About

Microblog

Search?

Time important

People important

Specialized syntax

Queries common

Repeated a lot

Change very little

Often n

avigational

Time and people less important

No syntax use

Queries longer

Queries developSlide36

Generalizing Across Systems

A particular featureA web search engine

Web search enginesSearch enginesInformation seeking

Build

new features

Build new tools

Build better systems

Bing experiment #123

Bing

Bing, Google, Yahoo

Different corpora

Browser, search, emailSlide37

Partitioning the Data

Corpus

Language

Location

Device

Time

User

System variant

[

Baeza

Yates et al. 2007]Slide38

Partition by Time

PeriodicitiesSpikesReal-time dataNew behaviorImmediate feedback

IndividualWithin sessionAcross sessions

[

Beitzel

et al. 2004]Slide39

Partition by User

Temporary ID (e.g., cookie, IP address)High coverage but high churn

Does not necessarily map directly to usersUser accountOnly a subset of users

[Teevan et al. 2007]Slide40

Partition by System Variant

Also known as controlled experimentsSome people see one variant, others anotherExample: What color for search result links?

Bing tested 40 colorsIdentified #0044CCValue: $80 millionSlide41

Everything is Significant

Everything is significant, but not always meaningfulChoose the metrics you care about firstLook for converging evidenceChoose comparison group carefully

From the same time periodLog a lot because it can be hard to recreate stateConfirm with metrics that should be the sameHigh variance, calculate empiricallyLook at the dataSlide42

Overview

Behavioral logs give practical, societal, personal insight

Sources include Web services, browsers, client appsPublic sources limited due to privacy concernsPartitioned query logs to view interesting slicesBy corpus, time, individualBy system variant = experimentWhat

behavioral logs cannot revealHow to address limitationsSlide43

<Back to results>

<Back to results>7:16 – Try new engineWhat Logs Cannot Tell Us People’s intentPeople’s success

People’s experiencePeople’s attentionPeople’s beliefs of what happensBehavior can mean many things81% of search sequences ambiguous[Viermetz

et al. 2006]

<Open in new tab>

<Open in new tab>

7:16 – Read Result 1

7:20 – Read Result 3

7:27 –Save links locally

7:12 – Query

7:14 – Click Result 1

7:15 – Click Result 3Slide44

IT & Politics

Integral Theory and Practice

Parenting

IT & Politics

Example: Click Entropy

Question: How ambiguous is a query?

Approach: Look at variation in clicks

[

Teevan

et al. 2008]

Measure:

Click entropy

Low if no variation

journal of information

High if lots of variation

jitpSlide45

Which Has Less Variation in Clicks?

www.usajobs.gov v. federal government jobs

find phone number v. msn live searchsingapore pools v.

singaporepools.comtiffany v. tiffany’snytimes v. connecticut

newspapers

campbells

soup recipes

v.

vegetable soup recipe

soccer rules v.

hockey equipment

?

?

?

Results change

Result quality varies

Tasks impacts # of clicks

Clicks/user = 1.1 Clicks/user = 2.1

Click position = 2.6 Click position = 1.6

Result entropy = 5.7 Result entropy = 10.7 Slide46

Beware of Adversaries

Robots try to take advantage your serviceQueries too fast or common to be a humanQueries too specialized (and repeated) to be realSpammers try to influence your interpretationClick-fraud, link farms, misleading content

Never-ending arms raceLook for unusual clusters of behaviorAdversarial use of log data

[

Fetterly

et al. 2004]Slide47

Beware of Tyranny of the Data

Can provide insight into behaviorExample: What is search for, how needs are expressedCan be used to test hypothesesExample: Compare ranking variants or link color

Can only reveal what can be observedCannot tell you what you cannot observeExample: Nobody uses Twitter to re-findSlide48

Supplementing Log Data

Enhance log dataCollect associated information Example: For browser logs, crawl visited webpagesInstrumented

panelsConverging methodsUsability studiesEye trackingSurveysField studiesDiary studiesSlide49

Example: Re-Finding Intent

Large-scale log analysis of re-finding [Tyler and

Teevan 2010]Do people know they are re-finding?Do they mean to re-find the result they do?Why are they returning to the result?Small-scale critical incident user studyBrowser plug-in that logs queries and clicks

Pop up survey on repeat clicks and 1/8 new clicksInsight into intent + Rich, real-world pictureRe-finding often targeted towards a particular URLNot targeted when query changes or in same sessionSlide50

Summary

Behavioral logs give practical, societal, personal insightSources include Web services, browsers, client appsPublic sources limited due to privacy concernsPartitioned query logs to view interesting slices

By corpus, time, individualBy system variant = experimentBehavioral logs are powerful but not complete pictureCan expose small differences and tail behaviorCannot expose motivation, which is often adversarialLook at the logs and supplement with complementary dataSlide51

Jaime Teevan

teevan@microsoft.comQuestions?Slide52

References

Adar, E. , J. Teevan and S.T. Dumais. Large scale analysis of Web revisitation

patterns. CHI 2008.Akers, D., M. Simpson, T. Wingorad and R. Jeffries. Undo and erase events as indicators of usability problems. CHI 2009. Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman and O. Frieder. Hourly analysis of a very large topically categorized Web query log

. SIGIR 2004.Broder, A. A taxonomy of Web search. SIGIR Forum, 36(2), 2002.Chilton, L. and J. Teevan. Addressing information needs directly in the search result page. WWW 2011.Cutrell, E., D.C. Robbins, S.T. Dumais and R. Sarin. Fast, flexible filtering with

Phlat

: Personal search and organization made easy.

CHI 2006.

Dagon, D.

Botnet detection and response: The network is the infection.

OARC Workshop 2005.Dasu, T. and T. Johnson. Exploratory data mining and data cleaning. 2004.Dumais, S. T., E.

Cutrell, J.J. Cadiz, G. Jancke, R. Sarin and D.C. Robbins.

Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003.Fetterly, D., M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages

. Workshop on the Web and Databases 2004.Fox, S., K. Karnawat, M. Mydland, S.T. Dumais and T. White. Evaluating implicit measures to improve Web search. TOIS 23(2), 2005.Jansen, B.J., A. Spink, J. Bateman and T. Saracevic. Real life information retrieval: A study of user queries on the Web. SIGIR Forum 32(1), 1998.Joachims, T. Optimizing search engines using

clickthrough data. KDD 2002.Kellar, M., C. Wattersand, M. Shepherd. The impact of task on the usage of Web browser navigation mechanisms. GI 2006.Kohavi, R., R.

Longbotham, D. Sommerfield and R.M. Henne. Controlled experiments on the Web: Survey and practical guide. Data Mining and Knowledge Discovery 18(1), 2009.Kohavi, R., R. Longbotham and T. Walker. Online experiments: Practical lessons. IEEE Computer 43 (9), 2010.Kotov

, A., P. Bennett, R.W. White, S.T. Dumais and J. Teevan. Modeling and analysis of cross-session search tasks. SIGIR 2011.Slide53

References

Kulkarni, A., J. Teevan, K.M. Svore and S.T. Dumais. Understanding temporal query dynamics

. WSDM 2011.Lau, T. and E. Horvitz. Patterns of search: Analyzing and modeling Web query refinement. User Modeling 1999.Marshall, C.C. The future of annotation in a digital (paper) world.

GSLIS Clinic 1998.Narayanan, A. and V. Shmatikov. Robust de-anonymization of large sparse datasets. IEEE Symposium on Security and Privacy 2008.Silverstein, C., Henzinger, M., Marais, H. and Moricz, M. (1999).

Analysis of a very large Web search engine query log

. SIGIR Forum, 33 (1), 6-12.

Tang, D., A.

Agarwal

and D. O’Brien.

Overlapping experiment Infrastructure: More, better, faster experimentation. KDD 2010. Teevan, J., E. Adar, R. Jones and M. Potts. Information re-retrieval: Repeat queries in Yahoo's logs. SIGIR 2007

.Teevan, J., S.T. Dumais and D.J. Liebling. To personalize or not to personalize: Modeling queries with variation in user intent.

SIGIR 2008.Teevan, J., S.T. Dumais and D.J. Liebling. A longitudinal study of how highlighting Web content change affects people's Web interactions. CHI 2010.Teevan, J., D.J. Liebling and G.R. Geetha. Understanding and

predicting personal navigation. WSDM 2011Teevan, J., D. Ramage and M. R. Morris. #TwitterSearch: A comparison of microblog search and Web search. WSDM 2011.Tyler, S. K. and J. Teevan. Large scale query log analysis of re-finding. WSDM 2010.

Viermetz, M., C. Stolz, V. Gedov and M. Skubacz. Relevance and impact of tabbed browsing behavior on Web usage mining. Web Intelligence 2006.Weinreich

, H., H. Obendorf, E. Herder and M. Mayer. Off the beaten tracks: Exploring three aspects of Web navigation. WWW 2006.White, R.W., S.T. Dumais and J. Teevan. Characterizing the influence of domain expertise on Web search behavior. WSDM 2009.Yates, B., G. Dupret

and J. Velasco. A study of mobile search queries in Japan. Query Log Analysis: Social and Technological Challenges. WWW 2007.