Vitaly Feldman IBM Research Almaden Foundations of Learning Theory 2014 Cynthia Dwork Moritz Hardt Omer Reingold Aaron Roth MSR SVC IBM Almaden ID: 510734
Download Presentation The PPT/PDF document "Using Data Privacy for Better Adaptive P..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Using Data Privacy for Better Adaptive Predictions
Vitaly Feldman IBM Research – Almaden
Foundations of Learning Theory, 2014
Cynthia Dwork Moritz Hardt Omer Reingold Aaron Roth MSR SVC IBM Almaden MSR SVC UPenn, CSSlide2
Statistical inference
Genome Wide Association StudiesGiven: DNA sequences with medical recordsDiscover: Find SNPs associated with diseasesPredict chances of developing some conditionPredict drug effectiveness
Hypothesis testing
Given:
samples
drawn
i.i.d
. from unknown distribution
over
Output solution
of value
with a guarantee that
w.p. (over )
Slide3
Existing approaches
Theoretical MLUniform convergence bounds for the solution class For every
of complexity ,
w.p.
Output stability-based bounds
But often too loose in practice (do not exploit additional structure) and complicated to derive
Practical ML
Cross-validation
Statistics
Model-specific fit and significance tests
Bootstrapping etc.
Slide4
Real world is interactive
Outcomes of analyses inform future manipulations on the same dataExploratory data analysisModel selectionFeature selectionHyper-parameter tuningPublic data - findings inform others
Samples are no longer
i.i.d
.!Slide5
Is the issue real?
Freedman’s paradox (1983):Data:
Throw away uncorrelated variables:
Perform least squares
regression over:
“Such practices can distort the significance levels of conventional statistical tests. The existence of this effect is well known, but its magnitude may come as a surprise, even to a hardened statistician
.”Slide6
competitions
Public
Private
Private dataPublic score
Data
Private score
http://www.rouli.net/2013/02/five-lessons-from-kaggles-event.html
“If you based your model solely on the data which gave you constant feedback, you run the danger of a model that
overfits
to the specific noise in that data
.” –
Kaggle
FAQ.Slide7
Adaptive statistical queries
is tolerance of the
query
With
probability
for all
Learning algorithm(s)
SQ oracle
[K93,
F
GRVX13]
Can measure
error/performance and
test hypotheses
Can be used in place of samples in most
algorithms!Slide8
SQ algorithms
PAC learning algorithms (except parities)Convex optimization (Ellipsoid, iterative methods)Expectation maximization (EM)SVM (with kernel)PCAICAID3k-meansmethod of moments
MCMCNaïve BayesNeural Networks (backprop)PerceptronNearest neighborsBoosting
[K 93, BDMN 05, CKLYBNO 06, FPV 14]Slide9
For a query
respond with
How many samples are needed to answer
queries?
If
are fixed then
With 1 round of
adaptivity
(constant
and
)?
Naïve answeringChernoff UnionSlide10
Our result
There exists an algorithm that can answer
adaptively chosen SQ such that with probability
the answers are -valid using
The algorithm runs in time
Also:
Cannot be achieved efficiently:
lower bound for poly-time algorithms
under crypto assumptions
[HU14]
Slide11
Fresh samples
Data set analyzed
differentially privatelySlide12
Privacy-preserving data analysis
How to get utility from data while preserving privacy of individuals
DATASlide13
Differential Privacy
Each sample point is created frompersonal data of an individual (GTTCACG…TC, “YES”)Differential Privacy
[DMNS06]
(Randomized) algorithm A is
-differentially private if for any two data sets
such that
:
I
f
then
Slide14
Properties of DP
Privacy has a priceMinimum data set size
usually scales as
Composable adaptively:If
is
-DP and
is
-
DP then
is
-DP
Or better
[DRV 10]: For every and , composition of -DP algorithms is
Slide15
is a loss function
an
-DP algorithm such that
DP implies generalization
For all
over
:
DP composition implies that DP preserving algorithms can reuse data adaptivelySlide16
Proof
For
and
let
be
with
-
th element replaced by
. By
-DP
Taking expectation over
Slide17
Counting queries
Counting query on a data set
For function
, value
DP algorithms for approximate answering of counting queries are actively studied for ~10 years
Data analyst(s)
Query release algorithm
Slide18
From private counting to SQs
Let be an (adaptive) query asking strategy
Let be an algo that answers counting queries of
s.t.
is -DP
For any data set
w.p
.
answers are
-accurate
Then for any
over , applied to , w.p. outputs -valid response to SQs of provided that
Can be extended to
-DP
with
Slide19
For
let
denote the -th query asked by
Depends on , randomness of (and nothing else)Let
+
Union bound and accuracy of
For let
denote the event
Proof I
Slide20
Concetration of
Markov’s
ineq
.
For
Proof IISlide21
Proof: moment bound
w
here
.
Suffice to prove for all
Consider
where
.
-DP approximately preserves conditional expectations
Slide22
Corollaries
There exists an algorithm that can answer
adaptively chosen SQ such that with probability
the answers are -valid using
random samples.
The algorithm runs in time
There exists an
-DP algorithm that can answer
(adaptive) counting queries such that with probability
the answers are
-accurate
provided that . The algorithm runs in time
Also
-DP for
[HR10]Slide23
MWU + Sparse Vector
Initialize
;
For each query
:
if
answer with
else
Answer with
Laplace noise
At most
MWU updates
Sparse Vector Technique
[DNRRV09]
: privacy loss only when approximate comparison with a threshold fails
Slide24
Threshold validation queries
Threshold SQ:
-valid response
YES
,
There exists an algorithm that can answer
adaptively chosen
t
hresholds SQ such that with probability the answers are -valid as long as at most
comparisons failed using
random samples.
The algorithm runs in time
time
Slide25
Applications
SQ oracle
Learning algorithm(s)
DATA
Validation set
Working set
Slide26
Conclusions
Adaptive data manipulations can cause overfitting/false discoveryTheoretical model of the problem based on SQsUsing exact empirical means is risky DP provably preserves “freshness” of samples: adding noise can provably prevent
overfittingIn applications not all data must be used with DPSlide27
Future work
Better (direct) bounds on
?Is
necessary?Better dependence on
?Efficient algorithms for special cases?Practical implementations?
THANKS!