By Ajinkya Thorve Introduction Advancement of tracking technologies has lead to increased data collection Collected data used sold and resold for serving targeted advertisements Serious privacy concern ID: 652817
Download Presentation The PPT/PDF document "Automated Experiments on Ad Privacy Sett..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Automated Experiments on Ad Privacy Settings
By
Ajinkya
ThorveSlide2
Introduction
Advancement of tracking technologies has lead to increased data collection.
Collected data used, sold and resold for serving targeted advertisements.
Serious privacy concern!
To increase transparency and provide control:
https://www.google.com/settings/adsSlide3
Google Ad Settings Page
Ref: http://suite4social.com/make-googles-ad-settings-work-for-you/Slide4
The Problem
Little information about how these pages operate.
Need to explore how user behavior (either directly with the Ad Settings or with content providers) alters the ads and settings shown to the user.
Need to study the degree to which the settings provides transparency and choice as well as check for the presence of discrimination.Slide5
Privacy Properties
1. Discrimination
Discrimination between two classes is difference in behavior towards those two classes.
Membership in a class causes a change in ads.
Discrimination is not always bad (e.g. clothing ads)Slide6
Privacy Properties (contd.)
2. Transparency
Display to users what the ad network may have learned about them.
Cannot expect an ad network to be completely transparent.
Only study the extreme case of the lack of transparency — opacity.
If some browsing activity results in a significant effect on the ads served, but has no effect on the ad settings — lack of transparency.Slide7
Privacy Properties (contd.)
3. Choice
Effectful
choice: Altering the settings has some effect on the ads seen by the user. Shows that altering the settings is not merely a “placebo button”, it has a real effect on the network’s ads.
Ad choice: Removing an inferred interest results in decrease in the number of ads related to the removed interest.Slide8
Methodology
Null hypothesis: Inputs do not affect the outputs.
Inputs:
User Behavior, Ad Settings
Output: Ads seen by the user
The goal: To establish that changes in a certain type input to a system causes an effect on a certain type of output of the system.Slide9
Methodology (contd.)Slide10
Methodology (contd.)Slide11
AdFisher
An automated tool to run experiments using the above methodology for a set of treatments, measurements, and classifiers.
Extensible: allowing the experimenter to implement additional functionalities or even study a different online platform.Slide12
AdFisher (contd.)
To simulate a new person,
AdFisher
creates an agent from a fresh browser instance with no browsing history, cookies, or other personalization.
To simulate interests,
AdFisher
downloads the top 100 URLs for different categories from
Alexa
and creates lists of
webpages
.
AdFisher
randomly assigns each agent to a group and applies the appropriate treatment.
Next,
AdFisher
takes measurements from the agent, parses the page to find the ads shown by Google and stores the ads.
10 reloads, 5s between successive reloads.Slide13
AdFisher (contd.)
News sites since they generally show many ads. Among the top 20 news websites on alexa.com, only five displayed text ads served by Google.
Most of the experiments on Times of India as it serves the most (five) text ads per page reload.
Repeat some experiments on the Guardian (three ads per reload) to demonstrate that our results are not specific to one site.Slide14
AdFisher (contd.)
It splits the entire data set into training and testing subsets, and examines a training subset of the collected measurements to select a classifier that distinguishes between the measurements taken from each group.
AdFisher
has functions for converting the text ads seen by an agent into three different feature sets.
The
URL feature set,
the
URL+Title
feature set,
the
word feature set.Slide15
AdFisher (contd.)
Explored a variety of classification algorithms provided by the
scikit
-learn library.
Logistic regression with an L2 penalty over the
URL+title
feature set consistently performed well compared to the others.Slide16
ExperimentsSlide17
Experiments
1. Discrimination
Set up
AdFisher
to have the agents in one group visit the Google Ad Settings page and set the gender bit to female while agents in the other group set theirs to male.
All the agents then visited the top 100 websites listed under the Employment category of
Alexa
.
The agents then collect ads from Times of India.
The learned classifier attained a test-accuracy of 93%, suggesting that Google did in fact treat the genders differently.Slide18
Experiments (contd.)Slide19
Experiments (contd.)
2. Transparency
The experimental group visited substance abuse websites while the control group idled.
None
of the 500 agents in the experimental group had interests related to substance abuse on their Ad Settings pages.
Collected the ads shown to the agents
.Slide20
Experiments (contd.)Slide21
Experiments (contd.)
3.
Effectful
Choice
Tested whether making changes to Ad Settings has an effect on the ads seen, thereby giving the users a degree of choice over the ads.
Simulated an interest in online dating by visiting the website www.midsummerseve.com
Agents in the experimental group removed the interest “Dating & Personals”.
All the agents then collected ads from the Times of India.
Found statistically significant differences between
the
groups.
Thus, the ad settings appear to actually give users the ability to avoid ads they might dislike or find embarrassing.Slide22
Experiments (contd.)Slide23
Conclusions
Conducted 21 experiments using 17,370 agents that collected over 600,000 ads.
Found instances of discrimination, opacity, and choice in targeted ads.
Cannot assign blame; cannot determine whether Google, the advertiser, or complex interactions among them caused the
issues;
lack the access needed to make this determination.Slide24
My Understanding and Issues
Only a few thousand browser agents, cannot generalize results.
“...we do not claim these findings to generalize or imply widespread issues, we find them concerning and warranting further investigation by those with visibility into the ad ecosystem.”
“We do not claim that we will always find a difference if one exists, nor that the differences we find are typical of those experienced by users.”Slide25
My Understanding and Issues (contd.)
Limitations of the experiment: Only text ads, only two websites.
“It comes with stock functionality for collecting and analyzing text ads. Experimenters can add methods for image, video, and flash ads.”
“The experimenter can add parsers to collect ads from other websites.”Slide26
My Understanding and Issues (contd.)
Same IP address.
“We do not claim “completeness” or “power”: we might fail to detect some use of information.”
“For example, Google might not serve different ads upon detecting that all the browser agents in our experiment are running from the same IP address. Despite this limitation in our experiments, we found interesting instances of usage.”Slide27
References
Datta
, A.,
Tschantz
, M. &
Datta
, A. (2015). Automated Experiments on Ad Privacy Settings.
Proceedings on Privacy Enhancing Technologies
, 2015(1), pp.
92-112.Slide28
Thank You!