/
Understanding Generalization Understanding Generalization

Understanding Generalization - PowerPoint Presentation

calandra-battersby
calandra-battersby . @calandra-battersby
Follow
396 views
Uploaded On 2017-09-22

Understanding Generalization - PPT Presentation

in Adaptive Data Analysis Vitaly Feldman Overview Adaptive data analysis Motivation Definitions Basic techniques With Dwork Hardt Pitassi Reingold Roth DFHPRR 1415 New results ID: 589860

samples data algorithm adaptive data samples adaptive algorithm queries statistical analysis dfhprr adaptively query answering chosen sqs analyst accuracy

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Understanding Generalization" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Understanding Generalization in Adaptive Data Analysis

Vitaly FeldmanSlide2

Overview

Adaptive data analysisMotivationDefinitionsBasic techniquesWith Dwork, Hardt, Pitassi, Reingold, Roth [DFHPRR 14,15]New results [F, Steinke 17]

Open problems2Slide3

3

Learning

problem

XGBoost

SVRG

Adagrad

SVM

Analysis

Model

Data

 

Distribution

over domain

 

=?

 

 

 Slide4

Statistical inference

Generalization

g

uarantees for

 

Algorithm

 

i.i.d

. samples from

 

Theory

Model complexity

Rademacher

compl

.

Stability

Online-to-batch

 Slide5

Data analysis is adaptive

Exploratory data analysisFeature selectionModel stackingHyper-parameter tuningShared datasets…

Steps depend on previous analyses of the same dataset

 

 

 

 

 

 

Data analyst(s)

 Slide6

“Quiet

scandal of statistics”[Leo Breiman

, 1992]

Thou shalt not test

hypotheses suggested

b

y dataSlide7

ML practice

7

Data

Data

Data

Data

Data

Data

Training

Testing

Test error of

 

 

Lasso k-NN SVM C4.5 KernelsSlide8

8

ML practice now

Test error of

 

 

 

 

XGBoost

SVRG

Tensorflow

Testing

Data

Data

Data

Data

Training

Data

Data

Data

ValidationSlide9

Adaptive data analysis [DFHPRR 14]

 

 

 

 

 

 

Data analyst(s)

 

Goal:

given

compute

’s “close” to running

on fresh samples

Each analysis is a query

Design algorithm for answering adaptively-chosen queries

 

AlgorithmSlide10

Adaptive statistical queries

Example:

 

 

 

 

 

 

 

Data analyst(s)

Can measure correlations, moments, accuracy/loss

Run

any

statistical query algorithm

 

with prob.

 

Statistical query oracle

[Kearns 93]

 Slide11

Given

non-adaptive query functions

and i.i.d. samples from estimate

Use empirical mean:

 

Answering non-adaptive SQsSlide12

Data splitting:

 

Answering adaptively-chosen SQs

What if we use

?

 

For some constant

Variable selection, boosting, bagging, step-wise regression ..

 Slide13

Answering adaptive SQs

[Bassily,Nissim,Smith,Steinke,Stemmer,Ullman 15]

Generalizes to low-sensitivity analyses:

when

differ in a single element

Estimates

within

 

[DFHPRR 14]

Exists an algorithm that can answer

adaptively chosen SQs with accuracy

for

 

Data splitting:

 Slide14

Differential privacy [

Dwork,McSherry,Nissim,Smith 06] Randomized algorithm

is

-

differentially private

if for any two data sets

that differ in one element:

 

 

ratio bounded

M

 

 Slide15

DP composes adaptively

DP implies generalization

Composition

of

-DP algorithms:

for

every

,

is

-DP

[

Dwork,Rothblum,Vadhan

10

]

 

Differential privacy is stability

Implies strongly uniform replace-one stability and

generalization in expectation

DP implies generalization with high probability

[DFHPRR 14, BNSSSU 15]Slide16

Value perturbation [DMNS

06] Answer low-sensitivity query with

Given

samples achieves error

where

is the worst-case sensitivity:

could be much larger than standard deviation of

 

16

Gaussian

 

 Slide17

Beyond low-sensitivity

17[F, Steinke 17] Exists an algorithm that for any adaptively-chosen sequence

given

i.i.d. samples from

outputs values

such that

w.h.p

. for all

:

w

here

 

For statistical queries:

given

samples get error that scales as

Value perturbation:

 Slide18

Stable Median

18

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Find an

approximate median

with DP relative to

v

alue

greater than bottom 1/3 and smaller than top 1/3 in

 

 

 

 

 Slide19

Requires discretization: ground set

, Upper bound:

samples

Lower

bound:

samples

[

Bun,Nissim,Stemmer,Vadhan

15]

 

Median algorithms

19

Exponential mechanism

[

McSherry

,

Talwar

07]

Output

w

ith prob.

Uses

samples

Stability and confidence amplification for the price of one

factor!

 

 

 Slide20

Analysis

Differential privacy approximately preserves quantilesIf is within

empirical quantiles

then

is within

true quantiles

is within mean

If

is well-concentrated on

then easy to prove high probability bounds

 

20

[F, Steinke 17]

Let

be a DP algorithm that on input

outputs a function

and a value

.

Then

w.h.p

. over

and

:

 Slide21

Limits

Any algorithm for answering adaptively chosen SQs with accuracy requires*

samples

[

Hardt

, Ullman 14; Steinke, Ullman 15]

*in sufficiently high dimension or under crypto assumptions

Verification of responses to queries:

where

is the number of queries that failed verification

Data splitting if overfitting

[DFHPRR

14]

Reusable holdout

[DFHPRR

15]

Maintaining public leaderboard in a competition

[Blum,

Hardt

15]

 

21Slide22

Open problems

Analysts without side information about Queries depend only on previous answersFixed “natural” analyst/Learning algorithmGradient descent for stochastic convex optimization 

22

Does there exist an

SQ analyst

whose queries

require more than

samples to

answer?

(with

accuracy/confidence)

 Slide23

Stochastic convex optimization

23

Convex body

Class

of convex

1-Lipschitz functions

 

Given

sampled i.i.d. from unknown

over

Minimize

true (expected) objective:

over

:

Find

s.t.

 

 

 

 Slide24

Gradient descent

24ERM via projected gradient descent:

Initialize

For

to

Output:

 

Overall:

statistical queries with accuracy

in

adaptive

rounds

Sample splitting:

samples

DP:

samples

 

 

Sample complexity is unknown

Uniform convergence:

samples (tight

[F. 16

]

)

SGD solves using

samples

[

Robbins,Monro

51;

Polyak

90

]

 

Fresh samples:

 Slide25

Conclusions

Real-valued analyses (without any assumptions)Going beyond tools from DPOther notions of stability for outcomesMax/mutual informationGeneralization beyond uniform convergenceUsing these techniques in practice25