/
Learning and Testing Junta Distributions Learning and Testing Junta Distributions

Learning and Testing Junta Distributions - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
380 views
Uploaded On 2017-05-10

Learning and Testing Junta Distributions - PPT Presentation

Maryam Aliakbarpour MIT Joint work with Eric Blais U Waterloo and Ronitt Rubinfeld MIT and TAU 1 The Problem 2 R elevant features   Smokes Does not regularly exercise ID: 546864

distributions junta collection testing junta distributions testing collection learning distribution set uniform test attack reject accept features heart bound

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Learning and Testing Junta Distributions" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Learning and Testing Junta Distributions

Maryam Aliakbarpour (MIT)Joint work with: Eric Blais (U Waterloo) and Ronitt Rubinfeld (MIT and TAU)

1Slide2

The Problem

2Slide3

Relevant features

 

Smokes

Does not regularly exercise

Gender: male

Correlates with heart attack

Irrelevant to heart attack

Binary features list:

distributed the same

Distribution over

heart attack patients.

Junta coordinates

3Slide4

Non-smoker

smoker

R

elevant features

 

Heart

attack correlates

Irrelevant to heart attack

Exercises

Does not exercise

4

Assumption: Irrelevant features are uniformly distributed.Slide5

Problem Definition

We call a

-junta

distribution

on the set

where

, if for any two vectors

and

such that

Observe that

.

 

 

 

 

 

5

 

Weight Slide6

Relevant features

 

Heart

attack correlates

Irrelevant to heart attack and

uniformly distributed.

is there a

small

such set?

Which set is it?

Testing problem

Learning problem

6Slide7

Related work

Feature selection: Guyon-Elisseeff’03, Liu-Motoda’12, and Chandrashekar-Sahin’14.Junta functions:

A. Blum’94

and

A. Blum-Langley’97

, ….,

Blais’09

, G. Valiant’12Property testing of distribution: GR00, BFR+00, BFF+01, Bat01, BDKR02

, BKR04, Val08, Pan08,

Val11, DDS+13, ADJ+11, LRR11

, ILR12, CDVV14, VV14

, DKN15b, DKN15a, ADK15

, and CDGR16Testing properties of collection of distributions: Levi-Ron-Rubinfeld’13, and

Diakonikolas-Kane’167Slide8

Our results

Learning8

Sample complexity

Running time

Lower

bound

Upper bound

Upper bound

Running time

Cover method

)

Our

algorithm

)

Sample complexity

Running time

Lower

bound

Upper

bound

Upper

bound

Running time

Cover method

Our

algorithmSlide9

Our results

Testing9

Lower

bound

Upper bound

Sample complexity

)

)

Lower

bound

Upper

bound

Sample

complexitySlide10

Learning Algorithm

10Slide11

PAC learning

Learning -junta distributions:Given

that

is a

-junta distributions,

outputs

which is

a

-junta distribution, and

-close to

.

 

In total variation distance:

 

11Slide12

12

There exists an

-learner for

-junta distributions using

samples.

 

Theorem Slide13

 

PMF of

 

For any

:

 

 

Parity function:

 

Overview of the Fourier analysis

13

 

We can estimate!Slide14

is a

-junta distribution on the set

.

 

For any subset

s.t.

,

is zero.

 

For any

of size

:

.

 

Lemma 1

14

Corollary

:

 Slide15

If

is a

-junta distribution on the set

but it is

-far from being a

-junta distribution on the set

,

 

 

15

Lemma 2Slide16

 

16

Estimating is enough!

Accurate Estimation:

 

 Slide17

Proof sketch of Lemma 2

17

For any

define

 

Recall:

 

 

Closest Junta to

on the set

 

 

 

 Slide18

Learning Algorithm

For every subset

of size

:

Estimate

.

Output

that maximizes

.

Output

the estimate of the biases of every setting on coordinates

.

 

 

18

 

 

 

 Slide19

Testing Algorithm

19Slide20

What does it mean to test?

Testing -junta distributions:If

is a

-junta distributions,

accept

with probability 2/3.

If

is -far from being a

-junta distributions, reject

with probability 2/3.

 

 

accept

reject

20Slide21

21

There exists an

-tester for

-junta distributions using

) samples.

 

Theorem Slide22

View

as a collection 

 

 

 

 

 

 

 

22

 

 

 

 Slide23

Reduction

is a junta distribution on

.

 

is a collection of uniform distributions.

 

 

 

 

 

23Slide24

Testing Algorithm

For every subset

of size

:

Partition the domain based on J and view P as

the collection of distributions,

.

If

is a collection of uniform distributions, Accept

. Reject.

 

How?

24Slide25

Testing collection of uniform distribution

 

 

 

 

25

Uniform distributionSlide26

Testing collection of uniform distributions

Paninski’08 uniformity test:Draw

samples.

Count the number of unique elements,

, in the sample set.

If

Reject.

Else

Accept.

 

26Slide27

Testing collection of uniform distributions

Paninski’08 uniformity test:Draw

samples.

Count the number of unique elements,

, in the sample set.

If

Reject.

Else

Accept.

 

Our Algorithm:

Draw

’s.

Construct

’s

from

’s.

number of unique elements among ’s.

number of unique elements among

’s

If

:

Accept

.Otherwise: Reject

.

 

27Slide28

Analysis

Paninski’08 uniformity test:Gap between YES cases

and

NO cases

:

is close its expected value!

 

28

Bound the Variance!Slide29

Analysis

Paninski’08 uniformity test:Gap between YES cases

and

NO cases

:

is close its expected value!

 

29

Bound the Variance!

Ours

Gap between

YES cases

and

NO cases

:

is close its expected value!

 

It only works, when:

are within a constant factor of each other.

 Slide30

Reduction to the special case

Partition distributions into

buckets

 

If the collection is

-far from being uniform,

 

The sub-collection in at least one of the buckets is

-far from being uniform.

 

30Slide31

Testing uniformity of a collection of distributions (general case)

Estimate

’s

Partition distributions such that

’s are within a

constant factor of each other.

For

each bucket

Test that the sub-collection in each bucket is a set of uniform distributions. If

the test rejects, Reject. Accept

.

 31Slide32

Conclusion

Summary:Introduced junta distributionsHow to learn junta distributionsHow to test junta distributionFuture directionsTighter resultRemoving uniformity assumption

32Slide33

Reference

Isabelle Guyon and Andr´e Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3:1157–1182, 2003.Huan Liu and Hiroshi

Motoda

. Feature selection for knowledge discovery and data

mining, volume

454. Springer Science & Business Media, 2012

.

Girish

Chandrashekar and Ferat Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40(1):16 – 28, 2014.Avrim Blum. Relevant examples and relevant features: Thoughts from computational learning theory. In AAAI Fall Symposium on ‘Relevance’, volume 5, 1994.Avrim

Blum and Pat Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1-2):245–271, December 1997.Reut Levi, Dana Ron, and Ronitt

Rubinfeld. Testing properties of collections of distributions. Theory of Computing, 9(8):295–347, 2013.Ilias Diakonikolas

and Daniel M. Kane. A new approach for testing properties of discrete distributions. CoRR, abs/1601.05557, 2016. URL http://arxiv.org/abs/1601.05557.

33Slide34

Reference

Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities and juntas. FOCS, pages 11–20, 2012.Blais, E.: Testing juntas nearly optimally. In: Proc. 41st Symposium on Theory of Computing, pp. 151–158 (2009)

34