/
Generating Well-Behaved Learning Curves Generating Well-Behaved Learning Curves

Generating Well-Behaved Learning Curves - PowerPoint Presentation

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
413 views
Uploaded On 2016-02-19

Generating Well-Behaved Learning Curves - PPT Presentation

An Empirical Study Gary M Weiss Alexander Battistin Fordham University Motivation Classification performance related to amount of training data Relationship visually represented by learning curve ID: 224140

learning 2014 dmin curves 2014 learning curves dmin data sampling j48 results set performance random cross curve bayes validation identify training knowledge

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Generating Well-Behaved Learning Curves" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Generating Well-Behaved Learning Curves:An Empirical Study

Gary M. Weiss

Alexander

Battistin

Fordham UniversitySlide2

MotivationClassification performance related to amount of training dataRelationship visually represented by learning curvePerformance increases steeply at firstSlope begins to decrease with adequate training dataSlope approaches 0 as more data barely helpsTraining data often costlyCost of collecting or labeling“Good” learning curves can help identify optimal amount of data7/22/2014

2

DMIN 2014Slide3

Exploiting Learning CurvesIn practice will only have learning curve up to current number of examples when deciding on whether to acquire more Need to predict performance for larger sizesCan do iteratively and acquire in batchesCan even use curve fittingWorks best if learning curves are well behavedSmooth and regular7/22/20143

DMIN 2014Slide4

Prior Work Using Learning CurvesProvost, Jensen and Oates1 evaluated progressive sampling schemes to identify point where learning curves begin to plateauWeiss and Tian2 examined how learning curves can be used to optimize learning when performance, acquisition costs, and CPU time are considered“Because the analyses are all driven by the learning curves, any method for improving the quality of the learning curves (i.e., smoothness, monotonicity) would improve the quality of our results, especially the effectiveness of the progressive sampling strategies.” 1 Provost, F., Jensen, D., and Oates, T. (1999) .Efficient progressive sampling. In Proc. 5th

Int. Conference on knowledge Discovery & Data Mining

, 23-32.

2 Weiss, G.,M., and

Tian

, Y. 2008. Maximizing classifier utility when there are data acquisition and modeling costs.

Data Mining and Knowledge Discovery

, 17(2): 253-282.

7/22/2014

4

DMIN 2014Slide5

What We DoGenerate learning curves for six data setsDifferent classification algorithmsRandom sampling and cross validationEvaluate curvesVisually for smoothness and monotonicity“Variance” of the learning curve7/22/20145DMIN 2014Slide6

The Data SetsName# ExamplesClasses# AttributesAdult32,5612

14

Coding

20,000

2

15

Blackjack

15,000

2

4

Boa1

11,000

2

68

Kr-

vs

-

kp

3,196

2

36

Arrhythmia

452

2279

7/22/2014

DMIN 2014

6Slide7

Experiment MethodologySampling strategies10-fold cross validation: 90% available for trainingRandom sampling: 75% available for trainingTraining set sizes sampled at regular 2% intervals of available dataClassification algorithms (from WEKA)J48 Decision TreeRandom ForestNaïve Bayes7/22/2014DMIN 2014

7Slide8

Results: AccuracyDatasetJ48RandomForestNaïveBayesAdult86.3

84.3

83.4

Coding

72.2

79.3

71.2

Blackjack

72.3

71.7

67.8

Boa1

54.7

56.0

58.0

Kr-

vs

-

kp

99.4

98.7

87.8

Arrhythmia

65.465.2

62.0

Average75.1

75.9

71.7

7/22/2014

DMIN 2014

8

Accuracy is not our focus, but a well behaved learning curve for a method that produces poor results is not useful. These results are for the largest training set size (no reduction)

J48 and Random Forest are competitive so we will focus on themSlide9

Results: VariancesDatasetJ48RandomForestNaïveBayesAdult0.51

0.32

0.01

Coding

9.78

17.08

0.19

Blackjack

0.36

2.81

0.01

Boa1

0.20

0.31

0.73

Kr-

vs

-

kp

3.54

12.08

4.34

Arrhythmia

41.4615.87

9.90

7/22/2014

DMIN 2014

9

Variance for a curve equals average variance in performance for each evaluated training set size. The results are for 10-fold cross validation. Naïve Bayes is best followed by J48. But Naïve Bayes had low accuracy

(see previous slide

)Slide10

J48 Learning Curves(10 xval)7/22/2014DMIN 201410Slide11

Random Forest Learning Curves7/22/2014DMIN 201411Slide12

Naïve Bayes Learning Curves7/22/2014DMIN 201412Slide13

Closer Look at J48 and RF(Adult)7/22/2014DMIN 201413Slide14

A Closer Look at J48 and RF(kr-vs-kp)7/22/2014DMIN 201414Slide15

Closer Look at J48 and RF(Arrhythmia)7/22/2014DMIN 201415Slide16

Now lets compare cross validation to Random Sampling, which we find generates less well behaved curves7/22/2014DMIN 201416Curves used Cross validationSlide17

J48 Learning Curves(Blackjack Data Set)7/22/2014DMIN 201417Slide18

RF Learning Curves(Blackjack Data Set)7/22/2014DMIN 201418Slide19

ConclusionsIntroduced the notion of well-behaved learning curves and methods for evaluating this propertyNaïve Bayes seemed to produce much smoother curves, but less accurateLow variance may be because they consistently reach a plateau earlyJ48 and Random Forest seem reasonableNeed more data sets to determine which is bestCross validation clearly generates better curves than random sampling (less randomness?)7/22/2014DMIN 2014

19Slide20

Future WorkNeed more comprehensive evaluationMany more data setsCompare more algorithmsAdditional metrics Count number of drops in performance with greater size (i.e., “blips”). Need better summary metric.Vary number of runs. More runs almost certainly yields smoother learning curves. Evaluate in contextAbility to identify optimal learning pointAbility to identify plateau (based on some criterion)7/22/2014

DMIN 2014

20Slide21

If Interested in This AreaProvost, F., Jensen, D., and Oates, T. (1999) .Efficient progressive sampling. In Proc. 5th Int. Conference on knowledge Discovery & Data Mining, 23-32.Weiss, G.,M., and Tian, Y. 2008. Maximizing classifier utility when there are data acquisition and modeling costs. Data Mining and Knowledge Discovery, 17(2): 253-282. Contact me if you want to work on expanding this paper (gaweiss@fordham.edu)

7/22/2014

DMIN 2014

21