Jaime Teevan Microsoft Reseach JITP 2011 David Foster Wallace Mark Twain Cowards die many times before their deaths Annotated by Nelson Mandela I have discovered a truly marvelous proof ID: 691534
Download Presentation The PPT/PDF document "Using Large Scale Log Analysis to Unders..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Using Large Scale Log Analysis to Understand Human Behavior
Jaime Teevan, Microsoft Reseach
JITP 2011Slide2
David Foster Wallace
Mark Twain
Cowards die many times
before their deaths.
Annotated by Nelson Mandela
I
have discovered a truly marvelous proof
...
which
this margin is too narrow to
contain.
Pierre de Fermat
(1637)
Students prefer used textbooks that are annotated.
[Marshall 1998]Slide3
Digital Marginalia
Do we lose marginalia with digital documents?Internet exposes information experiencesMeta-data, annotations, relationships
Large-scale information usage dataChange in focusWith marginalia, interest is in the individualNow we can look at experiences in the aggregateSlide4Slide5
http://hint.fm/seerSlide6
Defining
Behavioral Log DataBehavioral log data are:
Traces of human behavior, seen through a sensorActual, real-world behaviorNot recalled behavior or subjective impressionLarge-scale, real-timeBehavioral log data are not:Non-behavioral sources of large-scale dataCollected data (e.g., poll
data, surveys, census data)Crowdsourced data (e.g., Mechanical Turk)Slide7
Real-World, Large-Scale, Real-Time
Private behavior is exposedExample: Porn queries, medical queriesRare behavior is common
Example: Observe 500 million queries a dayInterested in behavior that occurs 0.002% of the timeStill observe the behavior 10 thousand times a day!New behavior appears immediatelyExample: Google Flu TrendsSlide8
Overview
How behavioral log data can be used
Sources of behavioral log dataChallenges with privacy and data sharingExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systemsWhat
behavioral logs cannot revealHow to address limitationsSlide9
Practical Uses for Behavioral Data
Behavioral data to improve Web searchOffline log analysisExample: Re-finding common, so add history supportOnline log-based experiments
Example: Interleave different rankings to find best algorithmLog-based functionalityExample: Boost clicked results in a search result listBehavioral data on the desktopGoal: Allocate editorial resources to create Help docsHow to do so without knowing what people search for?Slide10
Societal Uses of Behavioral Data
Understand people’s information needsUnderstand what people talk aboutImpact public policy? (E.g., DonorsChoose.org)
[
Baeza
Yates et al. 2007]Slide11
Generalizing About Behavior
Button clicksStructured answers
Information useInformation needsWhat people think
jitp
2011
Feature use
Human behaviorSlide12
Personal Use of Behavioral Data
Individuals now have a lot of behavioral data
Introspection of personal data popularMy Year in StatusStatus StatisticsExpect to see more
As compared to othersFor a purposeSlide13
Overview
Behavioral logs give practical, societal, personal insight
Sources of behavioral log dataChallenges with privacy and data sharingExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systems
What behavioral cannot revealHow to address limitationsSlide14
Web Service Logs
Example sourcesSearch enginesCommercial websitesTypes of informationBehavior: Queries, clicks
Content: Results, productsExample analysisQuery ambiguityTeevan, Dumais & Liebling. To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent. SIGIR 2008
Integral Theory and Practice
Parenting
IT & PoliticsSlide15
Public Web Service Content
Example sourcesSocial network sitesWiki change logsTypes of informationPublic content
Dependent on serviceExample analysisTwitter topic modelsRamage, Dumais & Liebling. Characterizing microblogging using latent topic models. ICWSM 2010 j
http://twahpic.cloudapp.netSlide16
Web Browser Logs
Example sourcesProxiesToolbarTypes of informationBehavior: URL visit
Content: Settings, pagesExample analysisDiff-IE (http://bit.ly/DiffIE)Teevan, Dumais & Liebling.
A Longitudinal Study of How Highlighting Web Content Change Affects .. Interactions. CHI 2010Slide17
Web Browser Logs
Example sourcesProxiesToolbarTypes of informationBehavior: URL visit
Content: Settings, pagesExample analysisWebpage revisitationAdar, Teevan & Dumais. Large Scale Analysis of Web
Revisitation Patterns. CHI 2008Slide18
Client-Side Logs
Example sourcesClient applicationOperating systemTypes of informationWeb client interactions
Other interactions – rich!Example analysisStuff I’ve SeenDumais, Cutrell, Cadiz, Jancke, Sarin & Robbins. Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003 Slide19
Types of Logs Rich and Varied
Web
services
Search enginesCommerce sitesPublic Web servicesSocial network sitesWiki change logsWeb BrowsersProxiesToolbars or plug-insClient applications
Interactions
Posts, edits
Queries, clicks
URL visits
System interactions
ContextResultsAdsWeb pages
shown
Sources of Log DataTypes of Information LoggedSlide20
Public Sources of Behavioral Logs
Public Web service contentTwitter, Facebook, Digg, WikipediaAt JITP:
InfoExtractor, Facebook Harvester, scraping toolsResearch efforts to create logsAt JITP: Roxy, a research proxyLemur Community Query Log Projecthttp://lemurstudy.cs.umass.edu/
1 year of data collection = 6 seconds of Google logsPublicly released private logsDonorsChoose.orghttp://developer.donorschoose.org/the-data
Enron
corpus, AOL
search logs, Netflix
ratingsSlide21
Example: AOL Search Dataset
August 4, 2006: Logs released to academic community3 months, 650 thousand users, 20 million queriesLogs contain anonymized
User IDsAugust 7, 2006: AOL pulled the files, but already mirroredAugust 9, 2006: New York Times identified Thelma Arnold“A Face Is Exposed for AOL Searcher No. 4417749”Queries for businesses, services
in Lilburn, GA (pop. 11k)Queries for Jarrett Arnold (and others of the Arnold clan)NYT contacted all 14 people in Lilburn with Arnold surnameWhen contacted, Thelma Arnold acknowledged her queries
August 21, 2006: 2 AOL employees fired, CTO resigned
September, 2006: Class action lawsuit filed against AOL
AnonID
Query
QueryTime
ItemRank
ClickURL
---------- --------- --------------- ------------- ------------1234567 jitp 2006-04-04 18:18:18 1 http://www.jitp.net/1234567 jipt
submission process 2006-04-04 18:18:18 3 http://www.jitp.net/m_mscript.php?p=21234567
computational social scinece 2006-04-24 09:19:321234567 computational social science 2006-04-24 09:20:04 2 http://socialcomplexity.gmu.edu/phd.php
1234567 seattle restaurants 2006-04-24 09:25:50 2 http://seattletimes.nwsource.com/rests1234567
perlman montreal 2006-04-24 10:15:14 4 http://oldwww.acm.org/perlman/guide.html1234567 jitp
2006
notification 2006-05-20 13:13:13…Slide22
Example: AOL Search Dataset
Other well known AOL usersUser 927 how to kill your wife
User 711391 i love alaskahttp://www.minimovies.org/documentaires/view/ilovealaskaAnonymous IDs do not make logs anonymous
Contain directly identifiable informationNames, phone numbers, credit cards, social security numbersContain indirectly identifiable informationExample: Thelma’s queriesBirthdate, gender, zip code identifies 87% of AmericansSlide23
Example: Netflix Challenge
October 2, 2006: Netflix announces contestPredict people’s ratings for a $1 million dollar prize100 million ratings, 480k users, 17k movies
Very careful with anonymity post-AOLMay 18, 2008: Data de-anonymized Paper published by Narayanan & ShmatikovUses background knowledge from IMDBRobust to perturbations in dataDecember 17, 2009: Doe v. Netflix
March 12, 2010: Netflix cancels second competition
Ratings
1:
[
Movie 1 of 17770]
12, 3,
2006-04-18 [
CustomerID
, Rating, Date]1234, 5 , 2003-07-08 [CustomerID, Rating, Date]
2468, 1, 2005-11-12 [CustomerID, Rating, Date]…
Movie Titles…
10120, 1982, “Bladerunner”17690, 2007, “The Queen”…
A
ll
customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy
policy. . . Even if, for example, you knew all your own ratings and their dates you probably couldn’t identify them reliably in the data because
only a small sample was included
(less than one tenth of our complete dataset) and that
data was subject to perturbation
.Slide24
Overview
Behavioral logs give practical, societal, personal insight
Sources include Web services, browsers, client appsPublic sources limited due to privacy concernsExample analysis of one source: Query logsTo understand people’s information needsTo experiment with different systems
What behavioral logs cannot revealHow to address limitationsSlide25
Query
Time
Userjitp 201110:41 am
5/15/11142039social science10:44 am
5/15/11
142039
computational social science
10:56 am
5/15/11
142039
jitp
201111:21 am
5/15/11659327crowne plaza seattle11:59 am 5/15/11318222
restaurants seattle12:01 pm 5/15/11
318222pikes market restaurants12:17 pm 5/15/11318222
stuart shulman12:18 pm 5/15/11142039
daytrips in seattle, wa
1:30 pm 5/15/11
554320jitp 2011
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039Slide26
Query
Time
User
jitp
2011
10:41 am
5/15/11
142039
social science
10:44 am
5/15/11
142039
teen
sex
10:56 am 5/15/11142039jitp
201111:21 am 5/15/11659327
crowne plaza seattle11:59 am 5/15/11318222
restaurants seattle
12:01 pm 5/15/11318222pikes market restaurants
12:17 pm 5/15/11
318222
stuart
shulman
12:18 pm
5/15/11
142039
daytrips in
seattle
,
wa
1:30 pm
5/15/11
554320
sex with animals
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039
cheap digital camera
12:17 pm 5/15/11
554320
cheap digital camera
12:18 pm
5/15/11
554320
cheap digital camera
12:19 pm
5/15/11
554320
社会科
学
11:59 am 11/3/23
12:01 pm
11/3/23
Porn
Language
Spam
System errors
Data cleaning
pragmatics
Significant part of data analysis
Ensure cleaning is appropriate
Keep track of the
cleaning process
Keep the original data around
Example:
ClimateGateSlide27
Query
Time
Userjitp 201110:41 am
5/15/11142039social science10:44 am
5/15/11
142039
computational social science
10:56 am
5/15/11
142039
jitp
201111:21 am
5/15/11659327crowne plaza seattle11:59 am 5/15/11318222
restaurants seattle12:01 pm 5/15/11
318222pikes market restaurants12:17 pm 5/15/11318222
stuart shulman12:18 pm 5/15/11142039
daytrips in seattle, wa
1:30 pm 5/15/11
554320jitp 2011
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039Slide28
Query
Time
Userjitp 201110:41 am
5/15/11142039social science10:44 am
5/15/11
142039
computational social science
10:56 am
5/15/11
142039
jitp
2011
11:21 am 5/15/11659327crowne plaza seattle11:59 am 5/15/11
318222restaurants seattle
12:01 pm 5/15/11318222pikes market restaurants12:17 pm 5/15/11
318222stuart shulman12:18 pm 5/15/11
142039daytrips in seattle,
wa
1:30 pm 5/15/11554320jitp
2011
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039
Query typologySlide29
Query
Time
Userjitp 2011
10:41 am 5/15/11142039social science
10:44 am
5/15/11
142039
computational social science
10:56 am
5/15/11
142039
jitp 2011
11:21 am 5/15/11659327crowne plaza seattle
11:59 am 5/15/11318222restaurants seattle
12:01 pm 5/15/11318222pikes market restaurants12:17 pm 5/15/11
318222stuart shulman12:18 pm 5/15/11
142039daytrips in seattle
, wa
1:30 pm 5/15/11554320
jitp
2011
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039
Query typology
Query behaviorSlide30
Query
Time
User
jitp
2011
10:41 am
5/15/11
142039
social science
10:44 am
5/15/11
142039
computational social science
10:56 am
5/15/11142039jitp 2011
11:21 am 5/15/11659327crowne plaza
seattle11:59 am 5/15/11318222restaurants
seattle
12:01 pm 5/15/11318222pikes market restaurants
12:17 pm 5/15/11
318222
stuart
shulman
12:18 pm
5/15/11
142039
daytrips in
seattle
,
wa
1:30 pm
5/15/11
554320
jitp
2011
1:30 pm
5/15/11
659327
jitp
program
2:32 pm
5/15/11
435451
jitp2011.org
2:42 pm
5/15/11
435451
computational social science
4:56 pm
5/15/11
142039
jitp
2011
5:02 pm
5/15/11
312055
xxx clubs
in
seattle
10:14 pm
5/15/11
142039
sex videos
1:49 am
5/16/11
142039
Query typology
Query behavior
Long term trends
Uses of Analysis
Ranking
E.g., precision
System design
E.g., caching
User interface
E.g., history
Test set development
Complementary researchSlide31
Things Observed in Query Logs
Summary measuresQuery frequencyQuery lengthAnalysis of query intentQuery types and topics
Temporal featuresSession lengthCommon re-formulationsClick behaviorRelevant results for queryQueries that lead to clicks
[Joachims 2002]
Sessions 2.20 queries long
[Silverstein et al. 1999]
[Lau and Horvitz, 1999]
Navigational, Informational, Transactional
[Broder 2002]
2.35 terms
[Jansen et al. 1998]
Queries appear 3.97 times
[Silverstein et al. 1999]Slide32
Surprises About Query Log Data
From early log analysis
Examples: Jansen et al. 2000, Broder 1998Queries are not 7 or 8 words longAdvanced operators not used or “misused”Nobody used relevance feedbackLots of people search for sex
Navigation behavior commonPrior experience was with library searchSlide33
Surprises About
Microblog Search?Slide34
Ordered
by time
Ordered by relevance
8 new tweets
Surprises About
Microblog
Search?Slide35
Ordered
by time
Ordered by relevance
8 new tweets
Surprises About
Microblog
Search?
Time important
People important
Specialized syntax
Queries common
Repeated a lot
Change very little
Often n
avigational
Time and people less important
No syntax use
Queries longer
Queries developSlide36
Generalizing Across Systems
A particular featureA web search engine
Web search enginesSearch enginesInformation seeking
Build
new features
Build new tools
Build better systems
Bing experiment #123
Bing
Bing, Google, Yahoo
Different corpora
Browser, search, emailSlide37
Partitioning the Data
Corpus
Language
Location
Device
Time
User
System variant
[
Baeza
Yates et al. 2007]Slide38
Partition by Time
PeriodicitiesSpikesReal-time dataNew behaviorImmediate feedback
IndividualWithin sessionAcross sessions
[
Beitzel
et al. 2004]Slide39
Partition by User
Temporary ID (e.g., cookie, IP address)High coverage but high churn
Does not necessarily map directly to usersUser accountOnly a subset of users
[Teevan et al. 2007]Slide40
Partition by System Variant
Also known as controlled experimentsSome people see one variant, others anotherExample: What color for search result links?
Bing tested 40 colorsIdentified #0044CCValue: $80 millionSlide41
Everything is Significant
Everything is significant, but not always meaningfulChoose the metrics you care about firstLook for converging evidenceChoose comparison group carefully
From the same time periodLog a lot because it can be hard to recreate stateConfirm with metrics that should be the sameHigh variance, calculate empiricallyLook at the dataSlide42
Overview
Behavioral logs give practical, societal, personal insight
Sources include Web services, browsers, client appsPublic sources limited due to privacy concernsPartitioned query logs to view interesting slicesBy corpus, time, individualBy system variant = experimentWhat
behavioral logs cannot revealHow to address limitationsSlide43
<Back to results>
<Back to results>7:16 – Try new engineWhat Logs Cannot Tell Us People’s intentPeople’s success
People’s experiencePeople’s attentionPeople’s beliefs of what happensBehavior can mean many things81% of search sequences ambiguous[Viermetz
et al. 2006]
<Open in new tab>
<Open in new tab>
7:16 – Read Result 1
7:20 – Read Result 3
7:27 –Save links locally
7:12 – Query
7:14 – Click Result 1
7:15 – Click Result 3Slide44
IT & Politics
Integral Theory and Practice
Parenting
IT & Politics
Example: Click Entropy
Question: How ambiguous is a query?
Approach: Look at variation in clicks
[
Teevan
et al. 2008]
Measure:
Click entropy
Low if no variation
journal of information
…
High if lots of variation
jitpSlide45
Which Has Less Variation in Clicks?
www.usajobs.gov v. federal government jobs
find phone number v. msn live searchsingapore pools v.
singaporepools.comtiffany v. tiffany’snytimes v. connecticut
newspapers
campbells
soup recipes
v.
vegetable soup recipe
soccer rules v.
hockey equipment
?
?
?
Results change
Result quality varies
Tasks impacts # of clicks
Clicks/user = 1.1 Clicks/user = 2.1
Click position = 2.6 Click position = 1.6
Result entropy = 5.7 Result entropy = 10.7 Slide46
Beware of Adversaries
Robots try to take advantage your serviceQueries too fast or common to be a humanQueries too specialized (and repeated) to be realSpammers try to influence your interpretationClick-fraud, link farms, misleading content
Never-ending arms raceLook for unusual clusters of behaviorAdversarial use of log data
[
Fetterly
et al. 2004]Slide47
Beware of Tyranny of the Data
Can provide insight into behaviorExample: What is search for, how needs are expressedCan be used to test hypothesesExample: Compare ranking variants or link color
Can only reveal what can be observedCannot tell you what you cannot observeExample: Nobody uses Twitter to re-findSlide48
Supplementing Log Data
Enhance log dataCollect associated information Example: For browser logs, crawl visited webpagesInstrumented
panelsConverging methodsUsability studiesEye trackingSurveysField studiesDiary studiesSlide49
Example: Re-Finding Intent
Large-scale log analysis of re-finding [Tyler and
Teevan 2010]Do people know they are re-finding?Do they mean to re-find the result they do?Why are they returning to the result?Small-scale critical incident user studyBrowser plug-in that logs queries and clicks
Pop up survey on repeat clicks and 1/8 new clicksInsight into intent + Rich, real-world pictureRe-finding often targeted towards a particular URLNot targeted when query changes or in same sessionSlide50
Summary
Behavioral logs give practical, societal, personal insightSources include Web services, browsers, client appsPublic sources limited due to privacy concernsPartitioned query logs to view interesting slices
By corpus, time, individualBy system variant = experimentBehavioral logs are powerful but not complete pictureCan expose small differences and tail behaviorCannot expose motivation, which is often adversarialLook at the logs and supplement with complementary dataSlide51
Jaime Teevan
teevan@microsoft.comQuestions?Slide52
References
Adar, E. , J. Teevan and S.T. Dumais. Large scale analysis of Web revisitation
patterns. CHI 2008.Akers, D., M. Simpson, T. Wingorad and R. Jeffries. Undo and erase events as indicators of usability problems. CHI 2009. Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman and O. Frieder. Hourly analysis of a very large topically categorized Web query log
. SIGIR 2004.Broder, A. A taxonomy of Web search. SIGIR Forum, 36(2), 2002.Chilton, L. and J. Teevan. Addressing information needs directly in the search result page. WWW 2011.Cutrell, E., D.C. Robbins, S.T. Dumais and R. Sarin. Fast, flexible filtering with
Phlat
: Personal search and organization made easy.
CHI 2006.
Dagon, D.
Botnet detection and response: The network is the infection.
OARC Workshop 2005.Dasu, T. and T. Johnson. Exploratory data mining and data cleaning. 2004.Dumais, S. T., E.
Cutrell, J.J. Cadiz, G. Jancke, R. Sarin and D.C. Robbins.
Stuff I've Seen: A system for personal information retrieval and re-use. SIGIR 2003.Fetterly, D., M. Manasse, and M. Najork. Spam, damn spam, and statistics: Using statistical analysis to locate spam Web pages
. Workshop on the Web and Databases 2004.Fox, S., K. Karnawat, M. Mydland, S.T. Dumais and T. White. Evaluating implicit measures to improve Web search. TOIS 23(2), 2005.Jansen, B.J., A. Spink, J. Bateman and T. Saracevic. Real life information retrieval: A study of user queries on the Web. SIGIR Forum 32(1), 1998.Joachims, T. Optimizing search engines using
clickthrough data. KDD 2002.Kellar, M., C. Wattersand, M. Shepherd. The impact of task on the usage of Web browser navigation mechanisms. GI 2006.Kohavi, R., R.
Longbotham, D. Sommerfield and R.M. Henne. Controlled experiments on the Web: Survey and practical guide. Data Mining and Knowledge Discovery 18(1), 2009.Kohavi, R., R. Longbotham and T. Walker. Online experiments: Practical lessons. IEEE Computer 43 (9), 2010.Kotov
, A., P. Bennett, R.W. White, S.T. Dumais and J. Teevan. Modeling and analysis of cross-session search tasks. SIGIR 2011.Slide53
References
Kulkarni, A., J. Teevan, K.M. Svore and S.T. Dumais. Understanding temporal query dynamics
. WSDM 2011.Lau, T. and E. Horvitz. Patterns of search: Analyzing and modeling Web query refinement. User Modeling 1999.Marshall, C.C. The future of annotation in a digital (paper) world.
GSLIS Clinic 1998.Narayanan, A. and V. Shmatikov. Robust de-anonymization of large sparse datasets. IEEE Symposium on Security and Privacy 2008.Silverstein, C., Henzinger, M., Marais, H. and Moricz, M. (1999).
Analysis of a very large Web search engine query log
. SIGIR Forum, 33 (1), 6-12.
Tang, D., A.
Agarwal
and D. O’Brien.
Overlapping experiment Infrastructure: More, better, faster experimentation. KDD 2010. Teevan, J., E. Adar, R. Jones and M. Potts. Information re-retrieval: Repeat queries in Yahoo's logs. SIGIR 2007
.Teevan, J., S.T. Dumais and D.J. Liebling. To personalize or not to personalize: Modeling queries with variation in user intent.
SIGIR 2008.Teevan, J., S.T. Dumais and D.J. Liebling. A longitudinal study of how highlighting Web content change affects people's Web interactions. CHI 2010.Teevan, J., D.J. Liebling and G.R. Geetha. Understanding and
predicting personal navigation. WSDM 2011Teevan, J., D. Ramage and M. R. Morris. #TwitterSearch: A comparison of microblog search and Web search. WSDM 2011.Tyler, S. K. and J. Teevan. Large scale query log analysis of re-finding. WSDM 2010.
Viermetz, M., C. Stolz, V. Gedov and M. Skubacz. Relevance and impact of tabbed browsing behavior on Web usage mining. Web Intelligence 2006.Weinreich
, H., H. Obendorf, E. Herder and M. Mayer. Off the beaten tracks: Exploring three aspects of Web navigation. WWW 2006.White, R.W., S.T. Dumais and J. Teevan. Characterizing the influence of domain expertise on Web search behavior. WSDM 2009.Yates, B., G. Dupret
and J. Velasco. A study of mobile search queries in Japan. Query Log Analysis: Social and Technological Challenges. WWW 2007.