MSU Product Center September 26 2007 Professor Larry G Hamm Presentation Outline Introduction Search Engine Basics Business Search with Google News Search Social Search Basic Information Trapping ID: 749712
Download Presentation The PPT/PDF document "The Evolving Internet: Some Implications..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The Evolving Internet: Some Implications, Strategies, and Techniques for More Effective Research
MSU Product Center
September 26, 2007
Professor Larry G. HammSlide2
Presentation Outline
Introduction
Search Engine Basics
Business Search with Google
News Search
Social Search
Basic Information Trapping
The Future??Slide3
QUESTIONS?
Who is Tim Berners-Lee?
What happened for “research” in 1990?Slide4Slide5
Current Number of Websites
July 2007-489,774,269Slide6
Top Global Web Properties Ranked by Total Unique Visitors (000)*
June 2007
Total Worldwide, Age 15+ - Home and Work Locations
Number(000’s) Percent Reach
Total Unique Internet Visitors --- 778,310
100%
Google Sites 544,783 70 Microsoft Sites 529.155 68
Yahoo! Sites 471,924 61
Time Warner Network 266,367 34
eBay 264,732 34
Wikipedia Sites 208,120 27
Fox Interactive Media 163,545 21
Amazon Sites 145,947 19
Apple Inc. 123,554 16
Adobe Sites 121,966 16
CNET Networks 116,579 15
Ask Network 115,655 15
Viacom Digital 88,654 11
Lycos Sites 77,517 10
The Mozilla Organization 70,850 9 Slide7
Share of Online Searches by Engine
August 2007
Total U.S. Home, Work and University Internet Users
Source: comScore qSearch
Aug
07
Total Internet Population
100%
Google Sites
56.5
Yahoo! Sites
23.3Microsoft Sites11.3Ask Network4.5Time Warner Network4.5
* Excludes traffic from public computers such as Internet cafes or access from mobile phones or PDAs.
Slide8
Share of Online Searches by Engine
August 2007
Total U.S. Home, Work and University Internet Users
Source: comScore qSearch
Aug
07
Total Number of Searches (Million)
9820
Google Sites
5545
Yahoo! Sites
2290Microsoft Sites1106Ask Network438Time Warner Network441
* Excludes traffic from public computers such as Internet cafes or access from mobile phones or PDAs.
Slide9
Herbert Simon, Nobel Prize Economist:
“What information consumes is rather obvious: it consumes the
attention
of its recipients. Hence
a wealth of information creates a poverty of attention”
SOURCE: “Designing Organizations for an Information-Rich World,” in Donald M. Lamberton, ed.,
The Economics of Communication and Information
(Cheltenham, England: Edward Elgar, 1997).Slide10
The Source of Power?
Knowledge is no longer the “scarce” resource.
Attention is the “limiting factor”!
Implications:
Global--- Decisions on what is brought into global consciousness
Research --- Discipline to direct and control your attention Slide11
The Role of ATTENTION
THEREFORE:
“The most important function of attention is not taking information in, but screening it out.”Slide12
Introduction The Meaning of Relevance
Definition:
The degree to which a search record (piece of information) meets the researchers’ query.
PROBLEM - Relevance to Search Engine and Researcher Are
DIFFERENT
To a researcher: Does the result help answer the intent of the query?
To a Search Engine: Does the result meet the search engine’s ranking algorithm?Slide13Slide14
Summary and Conclusion
Precision searching requires the process of consciously narrowing and eliminating the
gap
between
researcher’s
and
search engine’s
RELEVANCYKnowledge of the search process and the characteristics of information sources are required to attack search engine relevance.Intuition is required by the researcher to focus on formulating the search statements.Slide15
Search Engine Basics
The
Invisible
versus the
Visible
Web
Defining and Identifying Search Engines
How Search Engines Work
Why Google?Slide16
The Invisible Web
Great amounts of information exist than is not accessible via internet search engines
Much was
formatted digitally
but not ‘indexed’ (see latter lecture)
“Google Books” project is the grandest attempt to date to ‘shrink’ the invisible web.
Invisible Web information is differentiated by:
ACCESS
MODE of creationSlide17
The Invisible Web
(continued)
Information is differentiated by the nature of ACCESS to it:
1.Publicly available ---
Libraries
2.Semi-public ---
‘Private’ Libraries
i.e. MSU Libraries
3.Private data --- Only available for purchase or through reciprocity Slide18
The Invisible Web
(continued)
Types of
Private Data –
Private data sets open to anyone with a checkbook (Mintel)
Restricted private data sets --- to contributors (Trade Association)
Proprietary data of individual firms/public institutions (Freedom of Information Act)
Spy data (commercial and public)
Private data interfaces with ‘Searchable’ data when private data firms use “free sample” or “versioning” marketing strategiesSlide19
The Invisible Web (continued)
Differentiated by
MODE
of creation:
PRIMARY
versus
SECONDARY
Data
Primary Data is data collected/generated through direct observation, survey, or pollSecondary Data is data that is ‘repackaged’
primary data
Secondary data results from an ‘editing’ process
Evaluating secondary data requires an
identification and evaluation of the base source(s)Always go to the “ORIGINAL SOURCE”!!!Slide20
The Visible WebDefining and Identifying Search Engines
What is a search engine?
Definition – A search engine is an enormous database of websites compiled by a software robot that seeks out and indexes websites.
How does it work?
Sends a ‘spider’ or ‘crawler’ to visit a Web page, finds the information on the page.
The ‘crawler’ then sends its “finds” to an indexer which takes every word on a Web page, logs it, categorizes it and than stores the results in a huge databases
.Slide21
Defining and Identifying Search Engines
What types of search engines exist?
www.searchengineshowdown.com
www.lib.berkeley.edu/TeachingLib/Guides/Internet/FindInfo.html
General All Purpose Search Engines
(Big 4) – Google; YahooSearch; Live.com; Ask.com
Metasearch Engines
– Search engines that search other search engines (S.E. ‘bot’) –
www.dogpile.com
www.clusty.com
www.kartoo.comSlide22
Defining and Identifying Search Engines
What types of search engines exist?
(continued)
Specialized Search Engines
(Vertical Search Engines) – Search engines dedicated for specific subject areas or specific purposes. For research:
www.lii.org
“Customized Search Engine
” – Now anyone can create one www.google.com/coop/cse/
--- See
www.customsearchguide.comSlide23
The Visible Web
How Search Engines Work
Search Engine – RANKING ALGORITHMS
WHAT? – Ranking Algorithms are used to
ORDER
the search results
WHY DOES ORDER MATTER? Answer -
ATTENTION
because the researcher wants ‘help’ in deciding relevance for the searcher's needs
HOW? - Most ranking algorithms are and continue to be ordered by the frequency of use of the searched
“WORDS”
Google
created a new addition to their Ranking AlgorithmSlide24
The Visible Web
How Google Works
1.
The web server sends the query to the index servers. The content inside the index servers is similar to the index in the back of a book - it tells which pages contain the words that match the query.
2.
The query travels to the doc servers, which actually retrieve the stored documents. Snippets are generated to describe each search result.
3.
The search results are returned to the user in a fraction of a second.Slide25
The Visible WebConclusion
An Overview of a Basic Search
Be very proficient with
ONE
search engine
Remember because of different software approaches and indexing,
NO TWO SEARCH ENGINES WILL PRODUCE THE SAME RESULTS
When very focused and search is narrowed, identify and use other specific engines
Should the “
Product Center”
create their own?Slide26
Business Search with Google
Translating Web Language
Underlying Search Logic
Understanding Google Search Features
ConclusionSlide27
Translating Web Language
Reading URL’s –
Uniform Resource Locator
This the Web site’s address; i.e. Were a Web site lives
Example:
http://online.wsj.com/article/SB114609925357637113.html
http: -
Transfer Protocol (hypertext)the way the information is transfer on the Web.
HTML – Hypertext Markup Language is current Web language
XML (eXtensible Markup Language) is coming as the vehicle for information trappingSlide28
Translating Web Language
(continued)
Reading URL’s
(continued)
www.online.wsj.com
(domain name) of the server
Domain Suffix (com)
–
Perhaps the first and most important things to examine
Assigned by ICANN – Internet Corporation for Assigned Names and Numbers –
www.icann.org
Country Codes (.uk) follow domain suffix (.us) not used by most U.S. sites except with state/local government sites. Current Issues?Slide29
Translating Web Language(continued)
Reading URL’s
(continued)
Common DOMAIN SUFFIXES
.com - commercial site
.edu - educational institution
.gov - government agency in the U.S.
.net - network with most assigned to ISP networks
.org - non-profit/non-commercial organization (Caution:
many companies are setting up “non-profits” to get .org domain suffixes to disguise their agendas)
OTHERS - .mil, .biz, .info, .coop, .pro
Slide30
Underlying Search Logic
Boolean Logic Searches
Definition -
Use of mathematical set theory to retrieve search information.
AND, OR, and NOT searches
See following Venn diagrams:Slide31
Underlying Search Logic(continued)
Boolean Logic Searches -
ANDSlide32
The Visible Web
Why Google?
(continued)
Boolean Logic Searches -
ORSlide33
Underlying Search Logic(continued)
Boolean Logic Searches -
NOTSlide34
The Visible Web
Why Google?
Google Has Two Basic Strengths Over Other Search Engines
Popularity Ranking
Number of and Breadth of FeaturesSlide35
The Visible Web
Why Google?
(continued)
“Popularity” Ranking – “The Google Creation”
A page’s ranking includes a score for how many “other pages” link to it i.e. How ‘popular’ it is with other Web sites
This is done on multiple levels. For Example: If page
X
and Y
both have 100 pages linked to them, but the 100
Y
pages have more links to them than do the 100
X pages, Y gets a higher score for ranking Slide36
The Visible Web
Why Google?
(continued)
“Popularity” Ranking – “The Google Creation”
(continued)
THE UNDERLYING ASSUMPTION:
A Web page that has more pages linked indirectly (like a pyramid scheme) to it implies that more pages find it relevant implying that it will be more relevant to you.
Analogy – Your popularity is ranked within high school by how many friend your friends have and how many friends those friends have and so on.Slide37
The Visible Web
Why Google?
(continued)
“Popularity” Ranking – “The Google Creation”
(continued)
“THE GOOGLE BIAS”
New pages won’t have as many links as established pages; therefore a lower ranking. Analogy: New friends might be better than the old friends.Slide38
The Visible Web
Why Google?
(continued)
Google’s Breadth of Features
Home Page Features –
One of the Cleanest/Clutter Free Page
Advanced Search Features
Business research useful features are highlighted hereSlide39
The Visible Web
Why Google?
(continued)
Google’s Advanced Search Features
Advanced features allow searchers to narrow their queries to very specific searches
Narrowed searches allow the gaps between ‘researcher’ and ‘search engine’ RELEVANCY to close much quicker
With precision query formulation, the search will be faster and more useful
8 highlighted advanced featuresSlide40
The Visible Web
Why Google?
(continued)
1.Google uses a modified Boolean Search
Searches can be done from
Google Home Page
or from
Advanced Features PageSlide41
The Visible Web
Why Google?
(continued)
“Phrase Searching”
Google automatically
“ANDS”
words
Accepts one or more “OR’s”Use a minus sign in front of term to “NOT” it
Google will not search on very common
“STOP”
words
like “a”, “it”, and “the”.Slide42
The Visible Web
Why Google?
(continued)
2.
Option to retrieve only a specific file format
(pdf), (ps), (xls), (ppt), (doc), (rtf)
Very useful if searching for a certain ‘type’ of data. For example: xls. and financial data.
Slide43
The Visible Web
Why Google?
(continued)
3.
Date restrictions
4. Window to limit retrieval to title or URL fields
5. Box for limiting to (or excluding) a particular DOMAIN or URL
Slide44
The Visible Web
Why Google?
(continued)
6. Page Specific Searches:
for pages similar one to the entered URL
for pages that link to the entered URL
7. Links to “Topic-Specific Searches”
for pages similar one to the entered URL
for pages that link to the entered URL
8. Domain specific searches for .gov, .mil, and .edu
Slide45
Everything About Google??
http://www.google.com/intl/en/help/refinesearch.html#domain
http://www.google.com/intl/en/help/operators.html
http://www.google.com/intl/en/help/cheatsheet.html
http://www.google.com/intl/en/help/features.html
http://www.google.com/options/Slide46
The Visible WebThe Greatest Google Feature??
Skip the Title -
Click the cache?
WHY?
Google ‘
Highlights’
(different color)
keywords/phrasesNo pop-ups that are attached to Web pages
Faster – Google’s servers are the best in the world
Allows for
‘text only’
versionsAllows access when the current site is ‘unavailable’Slide47
The Visible WebThe Greatest Google Feature??
Further ‘Search’ Within the Result Generated Sites
If
not
in cache but titled page, use browser’s
“Find” button (Control+F) to show keywords/phrases
Use (Control+F) for NEW search with new words and phrases
Slide48
The Visible WebConclusions
Is the desired information -
CONCEPTUAL
or
FACTUAL?
If Conceptual:
Use in-depth research (library, books, scholarly journals, etc.) is most likely necessary to effectively frame the search.
If Factual: A search engine web search can most likely proceed
But always strive to find the
“Original Source”Slide49
The Visible WebConclusions
Set a time limit
- ‘Web Surfing’ can be addictive causing:
Tendencies to wonder off task
Get attention ‘fatigue’ resulting in overlooking possible sources
All other forms of destructive social and moral behaviors.Slide50
News Searching
What Do You Want:
Read news without ‘a paper, TV, or radio’?
Just see last second’s headline?
Find older stories?
Monitor an industry?
Other?Slide51
NEWS SOURCE GENERATED INFORMATION
Introduction
The evolution of news based information
Story telling
town criers
news posters/papers
electronic news divisions
the WEB
News is now a ‘commodity’
Minimal costs of distribution
‘Creation of news content’ is believed by many to be unrestricted (text messages, cell phone pictures, etc.)Believed by many that with the ‘information democracy’ they have the “right” to create news and that their “news” is as legitimate as anyone else’sSlide52
NEWS SOURCE GENERATED INFORMATION
Introduction
(continued)
News
Differentiation
Attention
Merger of News & Entertainment
The Daily Show
,
The Colbert Report
, etc.Slide53
NEWS SOURCE GENERATED INFORMATION
Five Specific Types of Web News Outlets
1.
Individual Online News Sites
2.
Breaking News Aggregators
3.
News Alert Services
4.Searchable News Data Bases
5.
Industry News Sites Slide54
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
1.
Individual Online News Sites
Definition – Migrations of existing established media outlets to Web based platform
Examples: CNN.com, nytimes.com, onlinewsj.com
Usually have graphics, delivery methods similar to parent outlet
Mix of “free” and for fee services
Most have archives with most of non-current for fees (NYT’s recent decision!)Slide55
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
2.
Breaking News Aggregators
Definition – Sites that pull material from multiple online news sources
Usually limited to recent “Headline” material
Use to do keyword search for relevant news articlesUse when the individual site does not cover all possible relevant (geographic/minor stories) informationSlide56
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
2.
Breaking News Aggregators
(continued)
Personalize one of the general portal sites (Google News, My Yahoo) and make it your “start page”
Go to a
“news service” site like BBC, CNN, MSNBC, etc.Go to favorite newspaper and set up an RSS feedSlide57
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
3.
News Alert Services
Definition – Same as breaking news aggregators except for a “User Profile” can be created
Delivery method is via e-mail or Web site
Useful if your particular interest is a company, product, topic, etc.
Issues include:
Completeness of what is delivered (original source, abstract)
Search provisions and degree of advanced features
Frequency of the UpdateSlide58
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
4.
Searchable News Data Bases
Definition – archive oriented (as opposed to headline) multiple news source aggregators
Best are “fee based” (MSU Library)
Dialog
LexisNexis
Dow Jones (Factiva)
ProQuest
OthersSlide59
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
4.
Searchable News Data Bases
(continued)
Web “free” sites
See Suggestions belowSlide60
NEWS SOURCE GENERATED INFORMATION
Web News Outlets
(continued)
5.
Industry News Sites
Definition – Industry specific news sites
Created by Trade Associations
Trade Publications (their migration to the Web)
Have combined features of several types above:
News Alert Service
Breaking News Aggregators
If ‘News’ source based, may have archivesExamples: www.foodinstitute.com. Slide61
A Few Suggested News Sites
Google News Archive Search
news.google.com/archivesearch
Claims to go back 300 years
Time, WSJ, NYTimes, The Guardian, The Washington Post
Sources from ProQuest, Factiva, HighBeam, etc.
Some full articles are free, most are fee
Timelines
Advanced Search featuresSlide62
A Few Suggested News Sites
See:
www.onstrat.com/news/newssearchchart.html
For a comparison of: Yahoo, Google, daypop, rocketnews, findroy, feedster, topixSlide63
A Few Suggested News Sites
www.monitor.bbc.uk/weekahead.shtml
www.wn.com
www.einnews.com
Subscription business information and online news service which draws from 35,000 sources
Covers 240 countries categorized by country and topic
Headlines Only!! Use to identify and than go to library sourcesSlide64
A Few Suggested News Sites
News Resource Guides:
www.kidon.com/media-link
Provides info and link to sources and indicates the presence of streaming audio and video
www.abyznewslinks.com
links to newspapers, broadcast stations, internet services, etc.
www.metagrid.com
List of 8000 online magazines newspapers worldwide
www.newswealth.com Unique categories of miscellaneous ‘news’ sourcesSlide65
A Few Suggested News Sites
Front Pages:
www.newseum.org/todaysfrontpages
581 front pages from 54 countries
Alphabetical main page with “Sort by Region” geographic listing
Thumbnail view
www.pressdisplay.com
Front Page Free – 7 day free trial
Full images of news pages of 500 newspapers from 70 countries for SUBSCRIPTION
Zoom and in paper search featureSlide66
A Few Suggested News Sites
Radio/TV Sources:
www.radio-locator.com
Links to over 10,000 radio stations and over 2500 audio streams from radio stations in 130 countries
www.tvradioworld.com
From over 200 countriesSlide67
Some Conclusions and Cautions
There is great redundancy so be very selective and methodical
One way is to “personalize” your news (Self-confirmation bias)
The nature of news creation and distribution means that there will be more broken links
Spend time becoming an
“Information Trapper!”Slide68
Social Search
What is social search?
No industry standard definition yet.
“Internet wayfinding tools informed by human judgment”
“Informed” can mean many things-including egregiously
uninformed.Slide69
Social Search
Algorithmic Search is “Social”
Algorithms are written by humans who make choices
Now. Search engines observe human behavior – click paths, popular, URL’s, etc which are used to modify the algorithm (Yahoo’s 14 tetragigs/day)
“Personalization efforts are becoming more evident. Slide70
Social Search
Why now?
Algorithmic search has plateaued
Humans are still better at some things
Rise in cocreation and collaboration via Web 2.0
Recall status of wikipedia
Social Networking
69% of females(56% males) ages 17-25 use Facebook
38% females (14% males) ages 17-25 use MySpace
70% ages 18-21
uses social networksSlide71
Social Search
Issues---
Scale and scope issues – How to keep up and what is the level of “control and policies”?
Tagging – How to you get to common understanding?
Folksonomy
(also known as
collaborative tagging
,
social classification, social indexing, social tagging
, and other names) is the practice and method of collaboratively creating and managing tags to annotate and categorize content.
Ambiguity of language (‘orange’)
Others?
Social search will probably work best for non-text content (photos, music, video, widgets, etc.)Slide72
Social Search
Some Selected Types of Social Search
Shared bookmarks and Web Pages
Tag Engines (blogs and RSS)
Collaborative directories
Personalized vertical search enginesSlide73
Shared Bookmarks
The most basic and probably least useful type of social search
http://del.icio.us/
http://www.shadows.com/
http://myweb2.search.yahoo.com/
http://www.furl.net/
http://www.diigo.com/communitySlide74
Tag Engines
Sometimes call “taggregators” primarily search blogs and RSS feeds
http://technorati.com/
- The #1
http://www.ask.com/?tool=bls
– Could be the best
http://www.blogpulse.com/
- Monitors and is a Nielsen firmSlide75
Collaborative directories
Directories created by teams of volunteers
Open Directory Project (AOL) – Has become dated and stale
http://www.prefound.com/
http://www.stumbleupon.com/
- Appears quite good
http://www.mahalo.com/
- Mostly currently popular material
http://www.linkedin.com/
-Professional NetworkingSlide76
Personalized Verticals
It is no longer difficult or laborious to create a specialized search engine –
http://www.google.com/coop/cse/
http://www.eurekster.com/
http://rollyo.com/
Slide77
Social SearchConclusions
Social Search will grow in importance
People are less predictable than algorithms – unlimited potential or problems?Slide78
Basic Information Trapping
Information Trapping is the process of setting monitors – traps – to cature information from the
flow
of the Web and have it sent to you.
Termed coined by Tara Calishain
http://www.researchbuzz.comSlide79
Basic Information Trapping
Info Trapping Pros
Faster Results
– As it happens, not weeks latter
More Results
– Don’t have to remember to check
Saves You Time
– Not constantly duplicating searches
Info Trapping ConsThe sheer volume can overwhelm you.Slide80
What Is Trappable?
News Stories
Web Sites
Conversations
Multimedia
Tag Directories
Blogs
Anything with an RSS FeedSlide81
Basic Information Trapping
This is where ‘the action is’ for:
Consumer research
Image management
Political planning and advertising
Social profiling
Etc.Slide82
How Do You ‘Trap’?
RSS Feed Readers
Web Page Monitors
E-Mail Alerts
‘Trapline’ Allocation
70% RSS
20% Web pages
10% e-mail alertsSlide83
Basic Information Trapping
RSS Feeds
Definition -
an XML-based specification that allows a Web site to instantly and automatically distribute its content (news and now more) to other sites
Accessing -
requires specialized software be installed by the researcherSlide84
Basic Information Trapping
RSS Feeds
(continued)
What is the value of RSS Feeds?
Prequalification
By setting the profile, the user ‘edits’ what information comes into the attention space
However, the researcher still has an obligation to do the editing
Guard against self-conformation biases
Must have a ‘focused’ relevancy strategySlide85
Basic Information Trapping
WEBLOGS a.k.a. BLOGS
Definition -
a form of personal journalism where an individual purporting to have knowledge of and interest in a specific topic posts his/her views on the topic on the Web.
Typical Characteristics of
Blogs
include:
daily postings
recommended links
often have “chatrooms” for forums and discussions
popular blogs now generate advertisingSlide86
Basic Information Trapping
E-mail Alerts are straight forward –
Most run on an RSS platform
Are now readily availableSlide87
Basic Information Trapping
Info Trapping is a separate training session
Some possible tools include:
http://www.aignes.com/
(WebSite Watcher) Free Trail than fee service
http://www.trackle.com/
Modest subscription fee
http://www.rocketnews.com/info/portal.jsp
http://www.boardtracker.com/
(Conversations)
http://boardreader.com/
(Conversations)http://find.yuku.com/ Web 2.0 (Conversations)Slide88
Basic Information Trapping
Some possible tools
(continued):
http://www.everyzing.com/
(Multimedia/Podcasts)
http://technorati.com/
(everything blogs)
http://www.icerocket.com/
(blogs)http://www.zuula.com/ (Beta version of a Metasearch engine for Info Trapping)Slide89
Basic Information Trapping
Requires a fair amount of work
Absolutely requires you have a very specific search query
Requires some advanced skills for managing the “Trapline”Slide90
The Future (NOW) of Internet Search?
“Blended” or “Universal” Search are becoming the norm
“Personalization’ of Search because of algorithm interaction with “YOUR” actual search actions
“Mobilization” will take everything where you are
The battle between Web 1.0 vs. Web 2.0 philosophy