/
Nearly optimal classification for Nearly optimal classification for

Nearly optimal classification for - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
379 views
Uploaded On 2017-07-02

Nearly optimal classification for - PPT Presentation

semimetrics LeeAd Gottlieb Ariel U Aryeh Kontorovich Ben Gurion U Pinhas Nisnevitch Tel Aviv U TexPoint fonts used in EMF Read the TexPoint manual before you delete this box ID: 565775

classifier sample space points sample classifier points space net radius construction generalization bounds classification problem distance semimetric point kernel

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Nearly optimal classification for" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Nearly optimal classification for semimetrics

Lee-Ad Gottlieb Ariel U.Aryeh Kontorovich Ben Gurion U.Pinhas Nisnevitch Tel Aviv U.

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.:

A

A

A

ASlide2

Classification problem

A fundamental problem in learning:Point space X Probability distribution P on X x {-1,1} Learner observes sample S of

n

points

(x,y) drawn iid ~P Wants to predict labels of other points in XProduces hypothesis h: X → {-1,1} with empirical error and true errorGoal: uniformly over h in probability

2

-1

+1Slide3

Classification problem

A fundamental problem in learning:Point space X Probability distribution P on X x {-1,1} Learner observes sample S of

n

points

(x,y) drawn iid ~P Wants to predict labels of other points in XProduces hypothesis h: X → {-1,1} with empirical error and true errorGoal: uniformly over h in probability

3

-1

+1Slide4

Classification problem

A fundamental problem in learning:Point space X Probability distribution P on X x {-1,1} Learner observes sample S of

n

points

(x,y) drawn iid ~P Wants to predict labels of other points in XProduces hypothesis h: X → {-1,1} with empirical error and true errorGoal: uniformly over h in probability

4

-1

+1Slide5

Generalization bounds

How do we upper bound the

true error?

Use a

generalization bound. Roughly speaking (and whp) true error ≤ empirical error + √[(complexity of h)/n]More complex classifier ↔ “easier” to fit to arbitrary data VC-dimension: largest point set

that can be shattered by h

-1

+1

-1

+1

5Slide6

Popular approach for classification

Assume the points are in Euclidean space!ProsExistence of inner productEfficient algorithms (SVM)Good generalization bounds (max margin)ConsMany natural settings non-Euclidean Euclidean structure is a strong assumptionRecent popular focus

Metric

space data

Semimetric space data6Slide7

7

Semimetric space(X, ρ) is a metric space if X = set of points

ρ

()

= distance function ρ:X×X → ℝNonnegative ρ(x,x′) ≥ 0, ρ(x,x′) = 0 ⇔ x = x′Symmetric ρ(x,x′) = ρ(x′,x)Triangle inequality ρ(x,x′) ≤ ρ(x,x′′) + ρ(x′,x′′)Slide8

8

Semimetric space(X,d) is a semimetric space if

X

= set of points

ρ() = distance function ρ:X×X → ℝNonnegative ρ(x,x′) ≥ 0, ρ(x,x′) = 0 ⇔ x = x′Symmetric ρ(x,x′) = ρ(x′,x)Triangle inequality ρ(x,x′) ≤ ρ(x,x′′) + ρ(x′,x′′)

inner product ⊂ norm ⊂

metric ⊂ semimetric Slide9

9

Semimetric examplesShannon-Jensen divergenceEuclidean-squared (ℓ2

2

)

Fractional ℓp spaces (p<1)Example: p = ½||a-b||p = (∑|ai-bi|p)

1/p

(0,2)

(0,0)

(2,0)

2

2

8

(0,1)

(0,0)

(2,2)

1

8

5Slide10

10

Semimetric examplesHausdorff distance

Point in

A

farthest from B1-rank Hausdorff distancePoint in A closest to Bk-rank Hausdorff

distancePoint in A k

-th closest to

B

A

B

A

BSlide11

11

Semimetric examplesNote:Semimetrics are often unintuitive

Example:

Diameter > 2*radius

(0,2)

(0,0)

(2,0)

2

2

8Slide12

Classification for semimetrics?

Problem: no vector representationNo notion of dot-product (and no kernel)What to do?Invent kernel (e.g. embed into Euclidean space)?.. Provably high distortion!Use some NN

heuristic?..

NN

classifier has ∞ VC-dim!Our approachSample compressionNN classificationResult: Strong generalization bounds12

-1

+1Slide13

Classification for semimetrics?

Problem: no vector representationNo notion of dot-product (and no kernel)What to do?Invent kernel (e.g. embed into Euclidean space)?.. Provably high distortion!Use some NN

heuristic?..

NN

classifier has ∞ VC-dim!Our approachSample compressionNN classificationResult: Strong generalization bounds13

-1

+1Slide14

Classification for semimetrics?

Problem: no vector representationNo notion of dot-product (and no kernel)What to do?Invent kernel (e.g. embed into Euclidean space)?.. Provably high distortion!Use some NN

heuristic?..

NN

classifier has ∞ VC-dim!Our approachSample compressionNN classificationResult: Strong generalization bounds14

-1

+1Slide15

Classification for semimetrics?

Central contribution: We discover that the complexity of the classifier is controlled byMargin ɣ of the sampleDensity dimension of the spaceWith bounds close to optimal…

15

-1

+1

ɣSlide16

16

Density dimensionDefinition: Ball B(x,r) = all points within distance r from x.

The

density constant

(of a metric M) is the minimum value c bounding the number of points in B(x,r) at mutual distance r/2Example: Density dimensionFor a set of n d-dimension vectors:SJ divergence: O(d)ℓp (p<1): O(d/

p)k-rank Hausdorff: O(k(

d+logn))

r

r/2Slide17

Classifier construction

Recall Compress sampleNNInitial approach:Classifier consistent with respect the sampleThat is, NN reconstructs the sample exactly

Solution:

ɣ

-net N of the sample17

-1

+1

ɣSlide18

Classifier construction

Recall Compress sampleNNInitial approach:Classifier consistent with respect the sampleThat is, NN reconstructs the sample exactly

Solution:

γ

-net N of the sample18

-1

ɣ

+1Slide19

Classifier construction

Recall

Compress sample

NN

Initial approach:

Classifier

consistent

with respect the sampleThat is, NN reconstructs the sample exactly

Solution:

ɣ-net N of the sample

19

-1

ɣ

+1Slide20

Classifier construction

Solution:

ɣ

-net

C

of the sample

SMust be consistentBrute force construction: O(

n2) Crucial question:

How many points in the net?

20

-1

ɣ

+1Slide21

Classifier construction

Theorem: If C is an optimally small ɣ-net of the sample S|C| ≤ (radius(S) / ɣ)dens(

S

)

Proof:C ← 2dens(S) points at mutual distance radius(S)/221Slide22

Classifier construction

Theorem: If C is an optimally small ɣ-net of the sample S|C| ≤ (radius(S) / ɣ)dens(

S

)

Proof:C ← 2dens(S) points at mutual distance radius(S)/2Associate each point not in C with NN in C22Slide23

Classifier construction

Theorem: If C is an optimally small ɣ-net of the sample S|C| ≤ (radius(S) / ɣ)dens(

S

)

Proof:C ← 2dens(S) points at mutual distance radius(S)/2Associate each point not in C with NN in C23Slide24

Classifier construction

Theorem: If C is an optimally small ɣ-net of the sample S|C| ≤ (radius(S) / ɣ)dens(

S

)

Proof:C ← 2dens(S) points at mutual distance radius(S)/2Associate each point not in C with NN in CRepeat log(radius(S) / ɣ) timesRuntime: n log(radius(S) / ɣ) Optimality: Constructed net may not be optimally smallBut it’s NP-hard to do much better

24Slide25

Generalization bounds

Upshot: 25Slide26

Generalization bounds

Upshot: What if we allow the classifier εn errors on the sample?

Better bound:

Where

k is the achieved compression sizeBut NP-hard to optimize bound, so consistent net is the best possible26Slide27

Generalization bounds

Upshot: even under margin assumptions, a sample of size exponential in dens will be required for some distributions

27