Vitaly Feldman Accelerated Discovery Lab IBM Research Almaden Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron Roth Microsoft Res Google Res U of Toronto Samsung Res ID: 524536
Download Presentation The PPT/PDF document "Generalization in Adaptive Data via Max-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Generalization in Adaptive Data via Max-Information
Vitaly FeldmanAccelerated Discovery LabIBM Research - Almaden
Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron Roth Microsoft Res. Google Res. U. of Toronto Samsung Res. Penn, CSSlide2
Distribution
over Analysis
Results
Param
. estimates
Classifier,Clusteringetc.
Slide3
Statistical inference
Result + generalization guaranteesAlgorithmHypothesis testsRegressionLearning“Fresh” i.i.d. samples
0
Techniques
CLT
Model complexity
Rademacher
compl
.
Stability
…Slide4
Data analysis is adaptiveData cleaningExploratory data analysis
Variable selectionHyper-parameter tuningShared datasets….Steps depend on previous analyses of the same dataset
Data analyst(s)
Slide5
It’s an old problem
“Quiet scandal of statistics”[Leo Breiman, 1992]Thou shalt not test hypotheses suggested by dataSlide6
Is this a real problem?
“Why Most Published Research Findings Are False” [Ioannidis 2005]Adaptive data analysis is one of the causes-hackingResearcher degrees of freedom [Simmons, Nelson, Simonsohn 2011]Garden of forking paths [Gelman, Loken 2015]
“Irreproducible preclinical research exceeds 50%,
resulting in approximately
US$28B/year loss” [Freedman,Cockburn
,Simcoe 2015]Slide7
Existing approaches I
AbstinencePre-registration
© Center for Open ScienceSlide8
Existing approaches II
Selective/post-selection inference Examples:Model selection + parameter inferenceVariable selection + regressionSurvey: [Taylor,
Tibshirani 2015]Slide9
Existing approaches III
Sample splitting
B
A
Data
Data
Data
C
Data
Data
Data
Might be necessary for standard techniquesSlide10
Adaptive data analysis
Data analyst(s)
Assumption:
is “valid” with high prob.
Goal:
is “almost as good as”
w.h.p
.
Approach: control
the increase in probability of any
event as a result of dependence
Slide11
Adaptive statistical queries [DFHPRR14]
with high prob.
Data analyst(s)
Statistical query oracle
[Kearns 93]
Slide12
Outcome stability/differential privacy [Dwork,McSherry,Nissim,Smith 06]
Randomized algorithm is -differentially private if for any two data sets such that
:
ratio bounded
A
Slide13
DP composes
adaptively
i
-DP
Slide14
i
-
DP
DP composes
adaptivelySlide15
DP implies
generalizationDP composesadaptivelyComposition of -DP algorithms: for every , is
-DP
[Dwork,Rothblum,Vadhan
10]
-DP upper-bounds the increase in probability of any “bad” event
-
DP ensures generalization for SQs
[
DFHPRR 14
]
-DP
case strengthened/extended
[BNSSSU 15]
Slide16
Description length
: Composes adaptivelyPreserves generalization (of subsequent analysis)
Let :
be an algorithm
s.t.
Then for any event
, and
over
For any
:
s.t
.
and
:
s.t
.
Then for
Define
Then for
gives
Slide17
Max-information
Max-information:
=
For
:
s.t.
where
over
Then for any event
, and
For any
:
s.t
.
and
:
s.t
.
Then for
Slide18
Max-info from -DP
By -DP for any adjacent and any
For any
and any
and so
Thus
For concentration event need
and so require
For
use concentration of divergence:
Implies that
suffices
Slide19
Approximate max-information:
Preserves
generalization Composes adaptively
-differential privacy
Description length,
Slide20
Further developmentsAdditional approaches:Mutual information
[Russo, Zou 2016]KL-stability, TV-stability [Bassily,Nissim,Smith,Steinke,Stemmer,Ullman 2016]Typical stability [Bassily,Freund 2016]-Differential privacy and
[
Rogers,Roth,Smith,Thakkar
2016
]
Slide21
ConclusionsAdaptive data analysis:Ubiquitous in practiceInvalidates standard generalization guarantees
Possible to model and improve on standard approachesNew theoretical approaches are neededMax-info gives a general approach that captures differential privacy and description lengthKnown results can be rederived via max-infoAdaptive composition requires strong assumptionsPossibly too strong for some applicationsApplicationsReusable holdout [DFHPRR 15]Reusable holdout in ML competitions [Blum,Hardt 15]Selection problems [Russo,Zou 16]