Session 9 Search and Advertising Sean J taylor Administrativia Assignment 2 online d ue Saturday 225 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 228 ID: 446945
Download Presentation The PPT/PDF document "Information technology in business and s..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Information technology in business and society
Session
9
–
Search and Advertising
Sean J.
taylorSlide2
Administrativia
Assignment 2
online
d
ue Saturday
2/25
at
1am
Assignment 2 resources
Assignment 3 preview
Guest speaker on Tuesday 2/28:
Chrys
Wu discussing IT and Journalism
Substitute on Thursday 3/1
Professor Dylan WalkerSlide3
Learning objectives
Learn how search engines rank pages
Learn how to design effectively for high rankings
Learn how online advertising works, especially search ads and keyword auctions
The future of searchSlide4
Search Engines and Web Directories
Resources on the Web that help you find sites with the information and/or services you want.
Directory search engine
- organizes listings of Web sites into hierarchical lists.
Search engine
- uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes. Slide5
Web directories Example
Advantages? Disadvantages?Slide6
Search engine
examples
Advantages? Disadvantages?Slide7
Search engines drive ecommerce!Slide8
Where is consumers attention?Slide9Slide10
Eyetracking study of Google ResultsSlide11
Search engines discover new pages by following links
Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list
Text content is important! But is not enough! (Why?)
How do search engines rank pages?
(why does this matter?)
How search engines workSlide12
PageRank is really a “Random Surfer” Model
Random Surfer Model:
What about getting stuck in loops?
takes care of that
Let’s count the surfer’s that pass through each point:
Transfer Matrix:
The probability that a surfer follows a link from webpage
i
to webpage j is =
[
P
rob. you were not “picked up”] *
[prob. of following link
i
->j ]
The matrix
if page
i
links to page j
Slide13
Measuring Importance of Linking
PageRank Algorithm
Idea: important pages are pointed to by other important pages
Method:
Each link from one page to another is counted as a “vote” for the destination page
The number of incoming links is important!
But it is not enough!
But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on)
Compare, for example, the cases for the circled page in
cases A
and
B
A
BSlide14
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
(ignoring damping factor for illustration)
Computing
PagerankSlide15
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
Computing PageRank
(ignoring damping factor for illustration)Slide16
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.250
.250
.250
.250
(ignoring damping factor for illustration)Slide17
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.250
.250
.250
.250
.250/3
.250
.250/3
.250/2
.250
.250/3
.250/2
(ignoring damping factor for illustration)Slide18
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.250/3
.250
.250/3
.250/2
.250
.250/3
.250/2
.375
.083
.083
.458
(ignoring damping factor for illustration)Slide19
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.375/3
.083
.375/3
.083/2
.458
.375/3
.083/2
.375
.083
.083
.458
(ignoring damping factor for illustration)Slide20
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.375/3
.083
.375/3
.083/2
.458
.375/3
.083/2
.500
.125
.125
.250
(ignoring damping factor for illustration)Slide21
PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
.400
.133
.133
.333
.400/3
.133
.400/3
.133/2
.333
.400/3
.133/2
(ignoring damping factor for illustration)Slide22
Gaming PageRank and Trust
TrustRank Algorithm
Initial votes come only from trusted pages
Compare, for example, the cases for the circled page in
cases A
and
B
B
trusted page
trusted page
Links from untrusted sources
ASlide23
SimulatingChanges in PageRank
People who bought this also bought…
BOOK
A
book B
book C
book D
People who bought this also bought…
BOOK
D
book C
People who bought this also bought…
BOOK
C
book A
People who bought this also bought…
BOOK
B
book A
book C
Change
PR of A
PR of C
C cuts link to A
0.18
0.50
C links to B
0.38
0.33
C links to D
0.24
0.40
C links to B & D
0.22
0.38
.400
.133
.133
.333Slide24
importance
of anchor text
<a href=http://www.sims…>
INFOSYS 141
</a>
<a href=http://www.sims…>
A terrific course on search
engines
</a>
The
anchor text
summarizes what the website is about.Slide25
Other ranking factors
Location, Location, Location...and Frequency
Query words in title, or in first few sentences
The more frequent the query words, the better
Click through measurement
How often users click on your URL, when they see it
How long do they stay (using toolbars!)Slide26
Outline
Learn how search engines rank pages
Learn how to design effectively for high rankings
Learn how online advertising works, especially search ads and keyword auctions
The future of searchSlide27
Achieving Higher Results Rankings
Position
y
our
k
eywords
(title, headings, early on page)
Make text visible
(no tiny fonts, no white-on-white)
Frames can
killHave relevant contentDo not change topicsJust say no t
o
s
earch
e
ngine
s
pamming
Submit your key pages
Verify your listing often Slide28
Motives
Commercial, political, religious, lobbies
Promotion funded by advertising budget
Operators
Contractors (Search Engine Optimizers) for lobbies, companies
Web masters
Hosting services
What are the techniques
used by rankings manipulators?
Manipulating RankingsSlide29
Manipulation technologies
Cloaking
Serve fake content to search engine robot
DNS cloaking:
Switch IP address. Impersonate
Doorway pages
Pages optimized for a single keyword that re-direct to the real target page
Keyword Spam
Misleading meta-keywords, excessive repetition of a term, fake “anchor text”
Hidden text with colors, CSS tricks, etc.
Link spammingMutual admiration societies, hidden links, awards
Domain flooding:
numerous domains that point or re-direct to a target page
Robots
Fake click stream
Fake query
stream
Is this a Search
Engine spider?
N
Y
SPAM
Fake
Doc
Cloaking
Meta-Keywords
=
“… London hotels, hotel, holiday inn,
hilton
,
discount, booking, reservation, sex, mp3,
britney
spears,
viagra
, …”
Risky to use any of these as search engines are
getting better at detecting and punishing themSlide30
Outline
Learn how search engines rank pages
Learn how to design effectively for high rankings
Learn how online advertising works, especially search ads and keyword auctions
The future of searchSlide31
Paid Ranking
Keyword bidding
for targeted ads
Pay-per-click
Higher bids result in higher ranks for the ad
Higher percentage of clicks on the ad, increase the rank as well (why?)
Google's
AdWords
is the biggest player
Google’s 2007 revenue was more than $16 Billion, 2008 ~ $22 Billion, mostly from such ads
Promoting without Manipulation: Paid placementSlide32
Example
AdWords
Placement
AdWords Placement
Most relevant sitesSlide33Slide34
Fund Your Website:
AdSense
Google also delivers ads to other websites
Sign-up for Google
AdSense
, and Google delivers ads to your website (common source of income for “professional” bloggers)
How ads are
delivered:
If
website best for targeted
keywords
If
users of website click on results
Strategies for successful ads:
Place the ads on top
Blend with the rest of the website
Ads at the bottom are ignored consistentlySlide35
Example: Washington Post
WebsiteSlide36
Analysis of Washington Post
WebsiteSlide37
Targeting Banner
Ads
Request for
Ad from
Ad Server
IP Address
Country, Domain, Company
Browser, Operating System
Surfing Behavior from cookies
Demographic Data?
Targeted Ad is
Delivered to User
Context:
Movie reviews
User Profile:
NYU user
New YorkSlide38
User
Visits
Publisher
Sites
Ads Delivered By
Dart For Advertisers
DART
For
Advertisers
Boomerang
Captures User
Action Data
Data Analysis
Databank
Boomerang Compiles & Reports Response For Future Targeting
User Clicks &
Visits
Advertiser’s
Site
Closed Loop Marketing
Source:
Doubleclick
, Inc.Slide39
Future of Search
Information
Extraction:
Search on Structured Data
Social Search
Privacy Preserving SearchSlide40
Information Extraction
Information extraction
applications extract structured relations from unstructured text
May 19 1995
, Atlanta -- The Centers for Disease Control
and Prevention, which is in the front line of the world's
response to the deadly
Ebola
epidemic in
Zaire
,
is finding itself hard pressed to cope with the crisis…
Date
Disease Name
Location
Jan. 1995
Malaria
Ethiopia
July 1995
Mad Cow Disease
U.K.
Feb. 1995
Pneumonia
U.S.
May 1995
Ebola
Zaire
Disease Outbreaks in
The New York Times
Information
Extraction System
(e.g., NYU’s Proteus)Slide41
Return structured
answers, not WebpagesSlide42
Future of Search
Information
Extraction:
Search on Structured Data
Social Search
Privacy Preserving SearchSlide43
Y! Answers
Launched in second half of 2005
Incentive system based on points and voting for best answers
Questions grouped by category
Some statistics:
over 60 million users
over 120 million answers, available in 18 countries and in 6 languagesSlide44Slide45
Y! AnswersSlide46
Y! AnswersSlide47
Long-term Prospects
Questions follow a power-law:
Large number of questions will be asked by many people (20% of questions
80% of requests)
We only need one answer for each question
Acquire quickly high-quality answers for 80% of queries
…people will take care in time of the “long tail” of the remaining
questionsSlide48
Future of Search
Information
Extraction:
Search on Structured Data
Social Search
Privacy Preserving SearchSlide49
Privacy preserving SearchSlide50
Next Class:
Social Networks
Work on Assignment 2