/
Generalization in Adaptive Data via Max-Information Generalization in Adaptive Data via Max-Information

Generalization in Adaptive Data via Max-Information - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
407 views
Uploaded On 2017-03-15

Generalization in Adaptive Data via Max-Information - PPT Presentation

Vitaly Feldman Accelerated Discovery Lab IBM Research Almaden Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron Roth Microsoft Res Google Res U of Toronto Samsung Res ID: 524536

generalization data max adaptive data generalization adaptive max analysis event information composes approaches stability differential adaptively selection 2016 privacy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Generalization in Adaptive Data via Max-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Generalization in Adaptive Data via Max-Information

Vitaly FeldmanAccelerated Discovery LabIBM Research - Almaden

Cynthia Dwork Moritz Hardt Toni Pitassi Omer Reingold Aaron Roth Microsoft Res. Google Res. U. of Toronto Samsung Res. Penn, CSSlide2

Distribution

over  Analysis

Results

Param

. estimates

Classifier,Clusteringetc.

 Slide3

Statistical inference

Result + generalization guaranteesAlgorithmHypothesis testsRegressionLearning“Fresh” i.i.d. samples

0

 

Techniques

CLT

Model complexity

Rademacher

compl

.

Stability

…Slide4

Data analysis is adaptiveData cleaningExploratory data analysis

Variable selectionHyper-parameter tuningShared datasets….Steps depend on previous analyses of the same dataset

 

 

 

 

 

 

 

Data analyst(s)

 Slide5

It’s an old problem

“Quiet scandal of statistics”[Leo Breiman, 1992]Thou shalt not test hypotheses suggested by dataSlide6

Is this a real problem?

“Why Most Published Research Findings Are False” [Ioannidis 2005]Adaptive data analysis is one of the causes-hackingResearcher degrees of freedom [Simmons, Nelson, Simonsohn 2011]Garden of forking paths [Gelman, Loken 2015]

 “Irreproducible preclinical research exceeds 50%,

resulting in approximately

US$28B/year loss” [Freedman,Cockburn

,Simcoe 2015]Slide7

Existing approaches I

AbstinencePre-registration

© Center for Open ScienceSlide8

Existing approaches II

Selective/post-selection inference Examples:Model selection + parameter inferenceVariable selection + regressionSurvey: [Taylor,

Tibshirani 2015]Slide9

Existing approaches III

Sample splitting

B

A

Data

Data

Data

C

Data

Data

Data

Might be necessary for standard techniquesSlide10

Adaptive data analysis

 

 

 

 

 

 

 

Data analyst(s)

 

Assumption:

is “valid” with high prob.

Goal:

is “almost as good as”

w.h.p

.

Approach: control

the increase in probability of any

event as a result of dependence

 Slide11

Adaptive statistical queries [DFHPRR14]

with high prob.  

 

 

 

 

 

 

Data analyst(s)

 

Statistical query oracle

[Kearns 93]

 Slide12

Outcome stability/differential privacy [Dwork,McSherry,Nissim,Smith 06]

Randomized algorithm is -differentially private if for any two data sets such that

:

 

 

ratio bounded

A

 

 

 Slide13

DP composes

adaptively

 

 

 

 

 

 

 

 

i

-DP

 Slide14

 

 

 

i

-

DP

 

 

DP composes

adaptivelySlide15

DP implies

generalizationDP composesadaptivelyComposition of -DP algorithms: for every , is

-DP

[Dwork,Rothblum,Vadhan

10]

 

-DP upper-bounds the increase in probability of any “bad” event

-

DP ensures generalization for SQs

[

DFHPRR 14

]

-DP

case strengthened/extended

[BNSSSU 15]

 Slide16

Description length

: Composes adaptivelyPreserves generalization (of subsequent analysis) 

Let :

be an algorithm

s.t.

Then for any event

, and

over

 

For any

:

s.t

.

and

:

s.t

.

Then for

 

Define

Then for

gives

 Slide17

Max-information

Max-information:

=

 

For

:

s.t.

where

over

Then for any event

, and

 

For any

:

s.t

.

and

:

s.t

.

Then for

 Slide18

Max-info from -DP

 By -DP for any adjacent and any

For any

and any

and so

Thus

For concentration event need

and so require

For

use concentration of divergence:

Implies that

suffices

 Slide19

Approximate max-information:

Preserves

generalization Composes adaptively

 

-differential privacy

 

Description length,

 Slide20

Further developmentsAdditional approaches:Mutual information

[Russo, Zou 2016]KL-stability, TV-stability [Bassily,Nissim,Smith,Steinke,Stemmer,Ullman 2016]Typical stability [Bassily,Freund 2016]-Differential privacy and

[

Rogers,Roth,Smith,Thakkar

2016

]

 Slide21

ConclusionsAdaptive data analysis:Ubiquitous in practiceInvalidates standard generalization guarantees

Possible to model and improve on standard approachesNew theoretical approaches are neededMax-info gives a general approach that captures differential privacy and description lengthKnown results can be rederived via max-infoAdaptive composition requires strong assumptionsPossibly too strong for some applicationsApplicationsReusable holdout [DFHPRR 15]Reusable holdout in ML competitions [Blum,Hardt 15]Selection problems [Russo,Zou 16]