/
Measuring and Fingerprinting click-spam in ad networks Measuring and Fingerprinting click-spam in ad networks

Measuring and Fingerprinting click-spam in ad networks - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
344 views
Uploaded On 2018-10-21

Measuring and Fingerprinting click-spam in ad networks - PPT Presentation

By Vacha Dave Saikat guha and yin zhang Presenter Uddipan chatterjee Key Ideas Advertisement plays an important role in the promotion and sale of all products Clickspam is fraudulent process of getting the user to click on a link leading to an ad which they have n ID: 692049

spam click user page click spam page user interstitial ads advertiser traffic control clicks search gold domain standard landing

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Measuring and Fingerprinting click-spam ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Measuring and Fingerprinting click-spam in ad networks

By

Vacha

Dave ,

Saikat

guha

and yin

zhang

Presenter:

Uddipan

chatterjeeSlide2

Key Ideas

Advertisement plays an important role in the promotion and sale of all products.

Click-spam is fraudulent process of getting the user to click on a link (leading to an ad) which they have no interest in.

The advertiser being charged for every click , ends up directing a bulk of the advertising revenue to the click-spammers.Slide3

Related Works

Documenting Click-Spam

:

N

.

Daswani

and M.

Stoppelman

. The anatomy of

clickbot.A

. In

Proceedings of

HotBots

,

2007.

B

. Miller, P. Pearce, C. Grier, C.

Kreibich

, and V.

Paxson

. What’s clicking

what

? techniques and innovations of today’s

clickbots

. In

Proceedings of the 8th international conference on Detection of intrusions and malware, and vulnerability assessment

, DIMVA’11, 2011.

Measuring

Traffic Quality

N.

Immorlica

, K. Jain, M.

Mahdian

, and K.

Talwar

. Click Fraud Resistant Methods for Learning Click-Through Rates. In

Proceedings of the Workshop on Internet and Network Economics (WINE ’05)

.

Q. Zhang, T.

Ristenpart

, S. Savage, and G. M.

Voelker

. Got traffic?: an evaluation of click traffic providers. In

Proceedings of the 2011 Joint WICOW/

AIRWeb

Workshop on Web Quality

,

WebQuality

’11, 2011. Slide4

Motivation

Click-spam costs online advertisers hundreds of millions of dollars each year.

It misguides the user who has no potential interest.

None of the ad-networks release specifics about click-spam.

Though in-house heuristics are available the rate of false-positive and false-negatives are high.Slide5

Research Goal

Identify on-going click-spam attacks in various ad networks.

Analyze in-depth and independently measure the click-spam attack rates.

Develop methodology

to

detect simultaneous click-spam attack

.Slide6

Advertising Primer: Ad delivery

User visits publisher’s website , get an ad-box in return.

User’s browser contacts ad network.

The request identifies the ad network’s referring website .

The ad-network populates the ad-box.Slide7

Advertising Primer: Charging Model

The most common model for charging is pay-per-click(PPC or CPC).

The advertiser pays the publisher if the user clicks on the ad.

Publisher gets around 70% of the revenue that the ad networks collect from the advertiser.

Ad-networks do a click-spam discount before billing the advertiser based on in-house heuristics.Slide8

Click-spam estimation: challenges

No

G

round Truth

Click-spam : the click that the user did not intend.

No certain way of knowing user intention.

No Global View

Ad network can not track user on advertiser sites.

Advertiser has no knowledge of user’s engagement with other advertisers

.

Granularity

NoiseSlide9

The Approach : Summary

Using Bayesian Approach to make up for the lack of ground truth.

Create two situations where fraction of the click-spam traffic is different.

Link these two situations using a Bayesian formula to cancel out the quantities we cannot measure.

Remaining quantities can be measured by the advertiser locally.

The approach does not report the click-spams specifically but gives an estimate of the fraction of click-spams over the total no. of clicks.Slide10

Data Collection: Basics

Interstitial Page

Gold-Standard User CheckSlide11

Data Collection: Assumptions

Assumption 1

: The interstitial page will turn a large portion of the click-spam traffic away.

Assumption 2

: The percentage of gold-standard users does not decrease because of the interstitial page.

Assumption 3

: Click-spam click-through ration is independent of the text of the ad (for the bluff ads).

Bluff AdsSlide12

Bayesian Estimation

and

be the events that the user is a gold standard user that arrives directly or via the interstitial page respectively.

Let

and

be the events that the user intended to click the ad out of all users reaching the landing page or all users reaching via the interstitial page respectively.

T

he advertiser is interested in probability of the event:

On simplification the equation becomes :

 Slide13

Bayesian Estimation

Final Estimation Formula

: The final formula for the estimation for the click-spam for the original as is expressed as the following:

Where

,

: number of gold standard user arriving directly or through interstitial page

,

: number of impressions of the original ad and control ad.

: number of clicks via interstitial reaching the landing page for original and control ad.

: number of clicks on the original ad directly reaching the landing page.

 Slide14

Limitations

Advertiser has to actively measure the click-spam.

Imposed interstitial page and control ads harm the user experience turning away much of the potential gold standard user.

If applied reactively with suspected click-spam traffic the amount of data becomes rare.

Finally the approach is always dependent and naturally sensitive to the three choices of the advertiser: interstitial page, definition of gold standard user and control ads.Slide15

Click-spam measurement: methodology

Signing up with ad network as 3 different advertisers which target 3 different keywords of respectively high , medium and low popularity from a ranked list of keywords.

Realistic looking landing pages were created capable of tracking mouse-landing , time spent on the page , switching browser tabs into or away from the page.

Next three types of interstitial pages were created and 4 ads for each landing page.

4 additional control ads are created for each landing page corresponding to the original ads.Slide16

Click-spam measurement: validation

The complementary click-spam rates of this paper is plotted as error bars with that of the ones reported by Google and Bing plotted as bars.

The estimates match with that of the yoga and lawnmower.

The mismatch occurred due to a search re-direction virus hijacking normal user searches.

Normalized estimate for search ads Slide17

Interstitial Page Performance

The plot represents fraction of clicks for each interstitial page that reach the landing page for the ”celebrity” ad and for the corresponding control ad.

Except for the Captcha interstitial , the fraction reaching the landing page is significantly lower for the control ad than the original ad, hence assumption 1 holds true.

The graphs show balanced no. of converters through interstitial and direct path supporting assumption 2.

Interstitial Page PerformanceSlide18

Gold standard users

In this graph we see 3 different definitions of the control graph.

The first is 5s of dwell time and

1

mouse event. The second being 15s of dwell time and 5 mouse events. The third is 30s of dwell time and 15 mouse events.

The fraction of gold standard users for the 2

nd

and 3

rd

definition is 0 for the control ad, making the 3

rd

assumption to be true.

Gold Standard User definitionSlide19

Fingerprinting click-spam

To prioritize manual investigation of the large number of clicks, simple graph clustering is used over features in the HTTP request and heavy-hitting clusters.

Groups of websites on unrelated domains were found with identical layouts driving traffic towards these sites.

The figure (a) plots the clustering and heavy-hitting output ,(b) plots the clusters from mobile ad network and

(c) Lists

the

top 5 heavy-

hitter

Clusters.

Graph clustering and heavy-hitter

detection outputSlide20

Thespecialsearch.com

affliates

For every search performed through the browser the malware contacted

a

specific IP address with the

URL:http

://63.223.106.16/bV03tDze8…

JpdHk

=08h.  The string of random

characters

are encoded using base64

. Decoding reveals that for each search the malware reports back to its C&C server the version number of the bot, affiliate ID and search engine.

In response the C&C server sends back an XML which contains the encoded URL to click and the HTTP

referer

values.

First the browser shows the unmodified results as requested.

On clicking the ad the bot kicks into action in the background. The bot contacted the URL it was directed to contact with appropriate

Referer

Header which led to a sequence of HTTP and JavaScript redirects.Slide21

Click-spam through TDL-4 botnet

The bot ceased this behavior for 24 hours which is as long as the IP address stayed the same.

The next day the bot would repeat the activity, perform one click and go dormant again.

It seems clear that the C&C server ensures that across the botnet each IP address is used only

for one click in a 24 hour period ,operating at a extremely low threshold that would likely not raise any flags.

And the clicks by the bot are easily disguised as that of the user.Slide22

Parked Domains

Parked Domains

: It is a domain name which is registered but not in use. The registrar typically points DNS for that domain name to a web server that serves up some message followed by a set of ads that the user may or may not click.

s

edo.com

parkers

:

A domain name registered by

sedo

was either never used or vacated by previous owner which now serves the parked page to users reaching the domain name.

A user may reach the parked page by mistyping the domain name (

icicbank.com

instead of

icicibank.com

) or through links to the domain which are still available in forums or indexed in search engines.

The user on reaching the parked domain is usually showed an ad-laden page.Slide23

Sedo.com parkers

Based on the user’s geographic location

sedo

serves a JavaScript code which in effect automatically clicks the first as link without even serving the parking page.

The automatic click initiates a chain of redirects many of which culminate in a redirect to

clicks.scour.com

, which then shows ads from major ad network.

The remaining auto-redirects reach either

thespecialsearch.com

encountered earlier in the context of malware or

searchmirror.com

In many cases the first ad is the correct version of the mistyped domain and where the user is automatically redirected to the correct website. This leads the advertiser(correct domain) to pay to the third parties where the user wanted to reach to the Web Page on his own anyway.Slide24

Advertising arbitrage

The cluster of

dotellall.com

and 20 other related domains account for 18% of the traffic for the search control.

These are a family of fake sites which advertise heavily on search and contextual ad networks. They advise to the tune of thousands of ads for a wide spread of (long-tail) keywords.

When a user clicks on these ads ,he is taken to the site which only shows ads when the user arrives through click-spam.

This family of sites also acts as a publisher to search ad networks.

Arbitraging Click-Spam traffic through fake siteSlide25

Future work

There are more classes of dubious traffic lurking in the data than the once discussed in this paper, which leaves room for future works.

Research can be conducted on the technological component of this problem to proactively discover attacks.

In case of mobile much of the telemetry needed to detect click-spam is currently non-existent thus future work can also be done in this space.Slide26

Conclusion

The paper developed click-spam methodologies and tested them on data gather from top ten major ad networks and four types of ads.

It identifies and analyses the seven ongoing click-spam attacks uncaught by any major ad network.

They conclude that the click-spam is a serious problem especially rampant in the mobile advertising context.

They believe it to be an open problem that requires concerted effort from the research community.

They also release the data that has been gathered for this paper to the aid of other researchers. Slide27