/
Information technology in business and society Information technology in business and society

Information technology in business and society - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
408 views
Uploaded On 2016-08-15

Information technology in business and society - PPT Presentation

Session 9 Search and Advertising Sean J taylor Administrativia Assignment 2 online d ue Saturday 225 at 1am Assignment 2 resources Assignment 3 preview Guest speaker on Tuesday 228 ID: 446945

search book bought

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Information technology in business and s..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Information technology in business and society

Session

9

Search and Advertising

Sean J.

taylorSlide2

Administrativia

Assignment 2

online

d

ue Saturday

2/25

at

1am

Assignment 2 resources

Assignment 3 preview

Guest speaker on Tuesday 2/28:

Chrys

Wu discussing IT and Journalism

Substitute on Thursday 3/1

Professor Dylan WalkerSlide3

Learning objectives

Learn how search engines rank pages

Learn how to design effectively for high rankings

Learn how online advertising works, especially search ads and keyword auctions

The future of searchSlide4

Search Engines and Web Directories

Resources on the Web that help you find sites with the information and/or services you want.

Directory search engine

- organizes listings of Web sites into hierarchical lists.

Search engine

- uses software agent technologies (or “spiders”, or “bots”) to search the Web for key words and place them into indexes. Slide5

Web directories Example

Advantages? Disadvantages?Slide6

Search engine

examples

Advantages? Disadvantages?Slide7

Search engines drive ecommerce!Slide8

Where is consumers attention?Slide9
Slide10

Eyetracking study of Google ResultsSlide11

Search engines discover new pages by following links

Keep track of words that appear in pages and when you enter a query, the search engine returns a ranked list

Text content is important! But is not enough! (Why?)

How do search engines rank pages?

(why does this matter?)

How search engines workSlide12

PageRank is really a “Random Surfer” Model

Random Surfer Model:

What about getting stuck in loops?

takes care of that

 

Let’s count the surfer’s that pass through each point:

Transfer Matrix:

 

The probability that a surfer follows a link from webpage

i

to webpage j is =

[

P

rob. you were not “picked up”] *

[prob. of following link

i

->j ]

 

The matrix

if page

i

links to page j

 Slide13

Measuring Importance of Linking

PageRank Algorithm

Idea: important pages are pointed to by other important pages

Method:

Each link from one page to another is counted as a “vote” for the destination page

The number of incoming links is important!

But it is not enough!

But each “vote” is different! PageRank places more importance to votes that come from pages with large number of votes (and so on, and so on)

Compare, for example, the cases for the circled page in

cases A

and

B

A

BSlide14

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

(ignoring damping factor for illustration)

Computing

PagerankSlide15

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

Computing PageRank

(ignoring damping factor for illustration)Slide16

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.250

.250

.250

.250

(ignoring damping factor for illustration)Slide17

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.250

.250

.250

.250

.250/3

.250

.250/3

.250/2

.250

.250/3

.250/2

(ignoring damping factor for illustration)Slide18

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.250/3

.250

.250/3

.250/2

.250

.250/3

.250/2

.375

.083

.083

.458

(ignoring damping factor for illustration)Slide19

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.375/3

.083

.375/3

.083/2

.458

.375/3

.083/2

.375

.083

.083

.458

(ignoring damping factor for illustration)Slide20

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.375/3

.083

.375/3

.083/2

.458

.375/3

.083/2

.500

.125

.125

.250

(ignoring damping factor for illustration)Slide21

PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

.400

.133

.133

.333

.400/3

.133

.400/3

.133/2

.333

.400/3

.133/2

(ignoring damping factor for illustration)Slide22

Gaming PageRank and Trust

TrustRank Algorithm

Initial votes come only from trusted pages

Compare, for example, the cases for the circled page in

cases A

and

B

B

trusted page

trusted page

Links from untrusted sources

ASlide23

SimulatingChanges in PageRank

People who bought this also bought…

BOOK

A

book B

book C

book D

People who bought this also bought…

BOOK

D

book C

People who bought this also bought…

BOOK

C

book A

People who bought this also bought…

BOOK

B

book A

book C

Change

PR of A

PR of C

C cuts link to A

0.18

0.50

C links to B

0.38

0.33

C links to D

0.24

0.40

C links to B & D

0.22

0.38

.400

.133

.133

.333Slide24

importance

of anchor text

<a href=http://www.sims…>

INFOSYS 141

</a>

<a href=http://www.sims…>

A terrific course on search

engines

</a>

The

anchor text

summarizes what the website is about.Slide25

Other ranking factors

Location, Location, Location...and Frequency

Query words in title, or in first few sentences

The more frequent the query words, the better

Click through measurement

How often users click on your URL, when they see it

How long do they stay (using toolbars!)Slide26

Outline

Learn how search engines rank pages

Learn how to design effectively for high rankings

Learn how online advertising works, especially search ads and keyword auctions

The future of searchSlide27

Achieving Higher Results Rankings

Position

y

our

k

eywords

(title, headings, early on page)

Make text visible

(no tiny fonts, no white-on-white)

Frames can

killHave relevant contentDo not change topicsJust say no t

o

s

earch

e

ngine

s

pamming

Submit your key pages

Verify your listing often Slide28

Motives

Commercial, political, religious, lobbies

Promotion funded by advertising budget

Operators

Contractors (Search Engine Optimizers) for lobbies, companies

Web masters

Hosting services

What are the techniques

used by rankings manipulators?

Manipulating RankingsSlide29

Manipulation technologies

Cloaking

Serve fake content to search engine robot

DNS cloaking:

Switch IP address. Impersonate

Doorway pages

Pages optimized for a single keyword that re-direct to the real target page

Keyword Spam

Misleading meta-keywords, excessive repetition of a term, fake “anchor text”

Hidden text with colors, CSS tricks, etc.

Link spammingMutual admiration societies, hidden links, awards

Domain flooding:

numerous domains that point or re-direct to a target page

Robots

Fake click stream

Fake query

stream

Is this a Search

Engine spider?

N

Y

SPAM

Fake

Doc

Cloaking

Meta-Keywords

=

“… London hotels, hotel, holiday inn,

hilton

,

discount, booking, reservation, sex, mp3,

britney

spears,

viagra

, …”

Risky to use any of these as search engines are

getting better at detecting and punishing themSlide30

Outline

Learn how search engines rank pages

Learn how to design effectively for high rankings

Learn how online advertising works, especially search ads and keyword auctions

The future of searchSlide31

Paid Ranking

Keyword bidding

for targeted ads

Pay-per-click

Higher bids result in higher ranks for the ad

Higher percentage of clicks on the ad, increase the rank as well (why?)

Google's

AdWords

is the biggest player

Google’s 2007 revenue was more than $16 Billion, 2008 ~ $22 Billion, mostly from such ads

Promoting without Manipulation: Paid placementSlide32

Example

AdWords

Placement

AdWords Placement

Most relevant sitesSlide33
Slide34

Fund Your Website:

AdSense

Google also delivers ads to other websites

Sign-up for Google

AdSense

, and Google delivers ads to your website (common source of income for “professional” bloggers)

How ads are

delivered:

If

website best for targeted

keywords

If

users of website click on results

Strategies for successful ads:

Place the ads on top

Blend with the rest of the website

Ads at the bottom are ignored consistentlySlide35

Example: Washington Post

WebsiteSlide36

Analysis of Washington Post

WebsiteSlide37

Targeting Banner

Ads

Request for

Ad from

Ad Server

IP Address

Country, Domain, Company

Browser, Operating System

Surfing Behavior from cookies

Demographic Data?

Targeted Ad is

Delivered to User

Context:

Movie reviews

User Profile:

NYU user

New YorkSlide38

User

Visits

Publisher

Sites

Ads Delivered By

Dart For Advertisers

DART

For

Advertisers

Boomerang

Captures User

Action Data

Data Analysis

Databank

Boomerang Compiles & Reports Response For Future Targeting

User Clicks &

Visits

Advertiser’s

Site

Closed Loop Marketing

Source:

Doubleclick

, Inc.Slide39

Future of Search

Information

Extraction:

Search on Structured Data

Social Search

Privacy Preserving SearchSlide40

Information Extraction

Information extraction

applications extract structured relations from unstructured text

May 19 1995

, Atlanta -- The Centers for Disease Control

and Prevention, which is in the front line of the world's

response to the deadly

Ebola

epidemic in

Zaire

,

is finding itself hard pressed to cope with the crisis…

Date

Disease Name

Location

Jan. 1995

Malaria

Ethiopia

July 1995

Mad Cow Disease

U.K.

Feb. 1995

Pneumonia

U.S.

May 1995

Ebola

Zaire

Disease Outbreaks in

The New York Times

Information

Extraction System

(e.g., NYU’s Proteus)Slide41

Return structured

answers, not WebpagesSlide42

Future of Search

Information

Extraction:

Search on Structured Data

Social Search

Privacy Preserving SearchSlide43

Y! Answers

Launched in second half of 2005

Incentive system based on points and voting for best answers

Questions grouped by category

Some statistics:

over 60 million users

over 120 million answers, available in 18 countries and in 6 languagesSlide44
Slide45

Y! AnswersSlide46

Y! AnswersSlide47

Long-term Prospects

Questions follow a power-law:

Large number of questions will be asked by many people (20% of questions

80% of requests)

We only need one answer for each question

Acquire quickly high-quality answers for 80% of queries

…people will take care in time of the “long tail” of the remaining

questionsSlide48

Future of Search

Information

Extraction:

Search on Structured Data

Social Search

Privacy Preserving SearchSlide49

Privacy preserving SearchSlide50

Next Class:

Social Networks

Work on Assignment 2