/
MAdFraud MAdFraud

MAdFraud - PowerPoint Presentation

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
379 views
Uploaded On 2017-04-20

MAdFraud - PPT Presentation

Swaraj W ankhade Investigating Ad Fraud in Android Applications INTRODUCTION Many Android applications are distributed for free but are supported by advertisements Ad libraries embedded in ID: 539840

apps request app clicks request apps clicks app http fraud impressions click ads url traffic user trees background android

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "MAdFraud" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

MAdFraudSwaraj Wankhade

Investigating Ad Fraud in Android ApplicationsSlide2

INTRODUCTION Many Android applications are distributed for free but

are supported

by advertisements

.

Ad libraries embedded in

the app

fetch content from the ad provider and display it on

the app’s

user interface.

The

ad provider pays the

developer for

the ads displayed to the user and ads clicked by

the user.

A major threat to this ecosystem is ad fraud,

where a

miscreant’s code fetches ads without displaying them

to the

user or “clicks” on ads automatically.

Ad

fraud has

been extensively

studied in the context of web advertising but

has gone

largely unstudied in the context of mobile advertising.Slide3

STUDYWe take the first

step to study mobile

ad fraud perpetrated by Android apps

.

We identify two fraudulent

ad behaviors

in apps

:

1) requesting ads while the app is

in the

background

2) clicking on ads without user

interaction.

Based on these observations, we developed

an analysis tool, MAdFraud

, which automatically runs

many apps

simultaneously in emulators to trigger and expose

adfraud

.Slide4

STUDY Since the formats of ad impressions and clicks

vary widely

between different ad providers we develop a

novel approach for automatically

identifying ad impressions

and clicks

in three steps: building HTTP request trees identifying ad request pages using machine learning and

detecting clicks

in HTTP request trees using

heuristics.

We

apply our

methodology and tool to two datasets: 1) 130,339

apps crawled

from 19 Android markets including Play and

many third-party

markets, and 2) 35,087 apps that likely

contain malware

provided by a security

company.Slide5

STUDY From analyzing these

datasets, we find that about 30% of apps with

ads make

ad requests while in running in the background

.

In addition

, we find 27 apps which generate clicks without

user Interaction.

We

find that the click fraud apps attempt

to remain

stealthy when fabricating ad traffic by only periodically sending clicks and changing which ad provider is

being targeted

between installations.Slide6

ANDROID APP ADVERTISINGThe developer

must register

with an Android ad provider, which provides

the developer

with a publisher ID and an ad library to

include in

their

app.

The library is responsible for fetching and displaying ads when the app is being run

.

Requesting an ad

for an

app is analogous to doing so on the

web: an

ad

request is

made over HTTP to the ad server which includes the developer’s publisher ID and user targeting

information.

The ad

server returns the ad’s content URL, click URL, and

any tracking

pixel URLs which must be fetched to display

the ad

.Slide7

METHODOLOGYRunning Android Apps:

We run apps in our dataset in an Android emulator (

API 17

) and capture their network traffic

.

For each app,

we create

a new emulator image, install the app on the

new emulator

, run the app in the foreground for 60 seconds,

put the

app into the background, and run for another 60 seconds.

To put the app in the background, we issue an intent

to the

emulator to open the browser on a static page

hosted on

our

server.Slide8

METHODOLOGYThe HTTP request to this static page marks the

boundary between the app’s foreground and

background activities

in the captured network traffic.

As

we

installed only

one third-party app on the emulator, all the

captured traffic

not from the emulator (evident by a nonstandard

TCP port

) or browser (evident by our server’s IP address in

the IP

header’s source or destination) is attributed to this app.

We choose not to interact with the app (i.e., touch

events) even

when it runs in the foreground to ensure that any

ad clicks

are generated without user interaction.Slide9

METHODOLOGYTo analyze the packets captured by MAdFraud, we use the Bro

Network Security

Monitor.

Using Bro, we can reconstruct TCP flows and extract application protocol

entities for

HTTP and DNS

traffic.

For HTTP, these entities include fields from the HTTP request and

response.

From

the HTTP

request, we extract header fields as well as the

URL and

request body.

From

the HTTP response, we record

the status

code, response type, and any URLs in the

unzipped response

body.Slide10

REQUEST TREESIntelligently constructing this HTTP Request Tree is

important because

it enables a number of techniques for

automatically analyzing

ad traffic, such as automatically detecting

clicks.

We represent each HTTP request in an app’s network traffic by a node in a request tree, and connect two nodes if

and only

if they are related according to three rules

.

The

first two

rules are based on the HTTP protocol

specifications

1) the client may set the request referrer field to

indicate to

the server the URL that contained the requested

URL , so

we consider the former URL as the parent of the

latter URL.Slide11

REQUEST TREES2)

The

server may set the location header

along with

a redirection status code to redirect the client to a an other URL, so we consider the original URL as the parent

of the

redirected URL

3) Finally

, to account for when the referrer header is missing, we extract all the URLs in the

HTTP response

body of a node and consider the node as the

parent of

all the

URLs.Slide12

AD REQUEST PAGE CLASSIFIERTo identify ad fraud, we must first identify impressions

in app

network traffic, which begin with an ad request

.

we

develop an approach for automatically identifying

impressions using

machine learning

.

From manually examining

mobile ads

in our previous work we know ad requests

have a

common, characteristic format

.

we classify request pages, identified by

the host

and path names, which is the portion of URL

before the

‘?’ character that denotes the beginning of the

query parameters

. This allows us to extract features over the aggregate of all requests to each request page. We then

classify whether

each page is for requesting ads.Slide13

FINDING IMPRESSIONS AND CLICKSWith the classified ad request pages

and HTTP

request trees

we

can extract impressions and clicks for each app

.

To accurately measure the number of

impressions, we

consider an ad request as an impression only when

none of

its ancestors in the request tree are ad

requests.

Besides impression fraud, another, perhaps more lucrative, revenue source for misbehaving apps is click

fraud, where

an app fabricates fraudulent

clicks.Slide14

There are two ways for apps to fabricate clicks without user interaction.First, the app may generate a touch event on the ad to

trick the

ad library to process it as a user click.

Second

, the

app may

parse the response body of the ad request and

extract the

click URL, and then make an HTTP request to the

click URL

.

To

detect either of these cases, we apply rules to

the sub trees

of ad request nodes in our request trees to determine if there is a path from an ad request node to a

click node.

we look for an HTTP redirection in the

subtree

of

an ad request node that ends on a node that

received HTML

as its response MIME type.Slide15

To handle these cases as well as marketers that have HTTPS landing pages, we infer that an ad request that has a redirection

in its

subtree

that redirects to a page

with a

schema other than http:// contains a click.Slide16

FINDINGSAd Fraud:We start by investigating which apps made ad

requests while

in the background

.

We build request trees from the

network captures

of the apps and find impressions by looking for classified ad request pages in the request trees. Using the list

of ARQ

pages identified by our classifier, we find that 40,409

of our

crawled apps generated a total of 274,128 impressions.Slide17

Finally, the periodicity of ad requests continues after apps have been put into the background, implying some apps continue to run the ad library after losing focus.

We expect

that some

of these cases are due to misconfiguration, however

we do

not attempt to determine intent.

Regardless

,

requesting ads

while the user is not using the app is undesirable behavior, as it is wasteful of device bandwidth and battery

life,and

may affect the ad provider’s bookkeeping.Slide18

CLICK FRAUDWe can now use the 274,128 request trees containing impressions that were found in the previous section to

find apps

that are fabricating

clicks.

Using the click rules described we

find that

21 apps fabricate a total of 59 clicks. We find that

24 of

the clicks were performed in the foreground and 35

were performed

in the background, indicating that the apps fabricating clicks continue doing so regardless of whether

they are

on the screen.

We manually investigate the HTTP

traffic

for these 21 apps to confirm that they were all

indeed click requests.Slide19

FINDINGSSlide20

There are a number of surprising results in this table.First, three apps on Google Play, collectively with thousands of installs, fabricated clicks. When we looked up the apps

on the market, we found that only one of the three

apps were

still available.

Second, a number of apps have a very similar

number of

impressions and clicks. We speculate that the same miscreant may have uploaded separate apps with the same

click fraud

code

.

Third, only three ad providers appear in the table. This could be for a variety of reasons including, 1)

these ad

providers are easy to sign-up for, or 2) these ad

providers have

less sophisticated click fraud detection

.

Fourth,

apps fabricate

clicks to at most one ad provider during the experiment. To reduce their risk of being detected, we

expect miscreants

to rotate between different ad providers but

this does

not seem to be the

case.Slide21

LIMITATIONSWe first discuss three limitations that may have led us to underestimate the prevalence of fraud behavior in

apps.

First,

we ran

apps in emulators instead of on real devices. some ad

libraries may

refuse to display ads while in an emulator, and

some fraud

apps may not send fraudulent traffic to avoid

being analyzed.

Second, we do not interact with the apps

and thus

we may not reach a UI state where apps would perform fraud

.

Third, we ran

all our

emulators on a single static IP address. It is

possible that

some ad providers may have blocked our IP

address during

our experiments.Slide22

CONCLUSIONWe have taken the first step to study mobile ad fraud on

a large scale. We developed a system and

approach, MAdFraud

, for running mobile apps, capturing their network traffic, and identifying ad impressions and clicks

.

To deal

with the wide variety of formats of ad impressions

and clicks

, we proposed a novel approach for automatically identifying ad impressions and clicks in three steps

.

We discovered and analyzed various fraudulent behavior in mobile ads.Slide23

THANK YOU.

Related Contents


Next Show more