in Adaptive Data Analysis Vitaly Feldman Overview Adaptive data analysis Motivation Definitions Basic techniques With Dwork Hardt Pitassi Reingold Roth DFHPRR 1415 New results ID: 589860
Download Presentation The PPT/PDF document "Understanding Generalization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Understanding Generalization in Adaptive Data Analysis
Vitaly FeldmanSlide2
Overview
Adaptive data analysisMotivationDefinitionsBasic techniquesWith Dwork, Hardt, Pitassi, Reingold, Roth [DFHPRR 14,15]New results [F, Steinke 17]
Open problems2Slide3
3
Learning
problem
XGBoost
SVRG
Adagrad
SVM
Analysis
Model
Data
Distribution
over domain
=?
Slide4
Statistical inference
Generalization
g
uarantees for
Algorithm
i.i.d
. samples from
Theory
Model complexity
Rademacher
compl
.
Stability
Online-to-batch
…
Slide5
Data analysis is adaptive
Exploratory data analysisFeature selectionModel stackingHyper-parameter tuningShared datasets…
Steps depend on previous analyses of the same dataset
Data analyst(s)
Slide6
“Quiet
scandal of statistics”[Leo Breiman
, 1992]
Thou shalt not test
hypotheses suggested
b
y dataSlide7
ML practice
7
Data
Data
Data
Data
Data
Data
Training
Testing
Test error of
Lasso k-NN SVM C4.5 KernelsSlide8
8
ML practice now
Test error of
XGBoost
SVRG
Tensorflow
Testing
Data
Data
Data
Data
Training
Data
Data
Data
ValidationSlide9
Adaptive data analysis [DFHPRR 14]
Data analyst(s)
Goal:
given
compute
’s “close” to running
on fresh samples
Each analysis is a query
Design algorithm for answering adaptively-chosen queries
AlgorithmSlide10
Adaptive statistical queries
Example:
Data analyst(s)
Can measure correlations, moments, accuracy/loss
Run
any
statistical query algorithm
with prob.
Statistical query oracle
[Kearns 93]
Slide11
Given
non-adaptive query functions
and i.i.d. samples from estimate
Use empirical mean:
Answering non-adaptive SQsSlide12
Data splitting:
Answering adaptively-chosen SQs
What if we use
?
For some constant
Variable selection, boosting, bagging, step-wise regression ..
Slide13
Answering adaptive SQs
[Bassily,Nissim,Smith,Steinke,Stemmer,Ullman 15]
Generalizes to low-sensitivity analyses:
when
differ in a single element
Estimates
within
[DFHPRR 14]
Exists an algorithm that can answer
adaptively chosen SQs with accuracy
for
Data splitting:
Slide14
Differential privacy [
Dwork,McSherry,Nissim,Smith 06] Randomized algorithm
is
-
differentially private
if for any two data sets
that differ in one element:
ratio bounded
M
Slide15
DP composes adaptively
DP implies generalization
Composition
of
-DP algorithms:
for
every
,
is
-DP
[
Dwork,Rothblum,Vadhan
10
]
Differential privacy is stability
Implies strongly uniform replace-one stability and
generalization in expectation
DP implies generalization with high probability
[DFHPRR 14, BNSSSU 15]Slide16
Value perturbation [DMNS
06] Answer low-sensitivity query with
Given
samples achieves error
where
is the worst-case sensitivity:
could be much larger than standard deviation of
16
Gaussian
Slide17
Beyond low-sensitivity
17[F, Steinke 17] Exists an algorithm that for any adaptively-chosen sequence
given
i.i.d. samples from
outputs values
such that
w.h.p
. for all
:
w
here
For statistical queries:
given
samples get error that scales as
Value perturbation:
Slide18
Stable Median
18
Find an
approximate median
with DP relative to
v
alue
greater than bottom 1/3 and smaller than top 1/3 in
Slide19
Requires discretization: ground set
, Upper bound:
samples
Lower
bound:
samples
[
Bun,Nissim,Stemmer,Vadhan
15]
Median algorithms
19
Exponential mechanism
[
McSherry
,
Talwar
07]
Output
w
ith prob.
Uses
samples
Stability and confidence amplification for the price of one
factor!
Slide20
Analysis
Differential privacy approximately preserves quantilesIf is within
empirical quantiles
then
is within
true quantiles
is within mean
If
is well-concentrated on
then easy to prove high probability bounds
20
[F, Steinke 17]
Let
be a DP algorithm that on input
outputs a function
and a value
.
Then
w.h.p
. over
and
:
Slide21
Limits
Any algorithm for answering adaptively chosen SQs with accuracy requires*
samples
[
Hardt
, Ullman 14; Steinke, Ullman 15]
*in sufficiently high dimension or under crypto assumptions
Verification of responses to queries:
where
is the number of queries that failed verification
Data splitting if overfitting
[DFHPRR
14]
Reusable holdout
[DFHPRR
15]
Maintaining public leaderboard in a competition
[Blum,
Hardt
15]
21Slide22
Open problems
Analysts without side information about Queries depend only on previous answersFixed “natural” analyst/Learning algorithmGradient descent for stochastic convex optimization
22
Does there exist an
SQ analyst
whose queries
require more than
samples to
answer?
(with
accuracy/confidence)
Slide23
Stochastic convex optimization
23
Convex body
Class
of convex
1-Lipschitz functions
Given
sampled i.i.d. from unknown
over
Minimize
true (expected) objective:
over
:
Find
s.t.
Slide24
Gradient descent
24ERM via projected gradient descent:
Initialize
For
to
Output:
Overall:
statistical queries with accuracy
in
adaptive
rounds
Sample splitting:
samples
DP:
samples
Sample complexity is unknown
Uniform convergence:
samples (tight
[F. 16
]
)
SGD solves using
samples
[
Robbins,Monro
51;
Polyak
90
]
Fresh samples:
Slide25
Conclusions
Real-valued analyses (without any assumptions)Going beyond tools from DPOther notions of stability for outcomesMax/mutual informationGeneralization beyond uniform convergenceUsing these techniques in practice25