/
Trade-offs in Explanatory Trade-offs in Explanatory

Trade-offs in Explanatory - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
377 views
Uploaded On 2018-11-25

Trade-offs in Explanatory - PPT Presentation

Model Learning  DCAP Meeting Madalina Fiterau 22 nd of February 2012 1 Outline Motivation need for interpretable models Overview of data analysis tools Model evaluation accuracy ID: 733555

data eop models accuracy eop data accuracy models execution complexity decision cart step spam iteration evaluation accurate regions features error model detection

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Trade-offs in Explanatory" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Trade-offs in Explanatory Model Learning 

DCAP MeetingMadalina Fiterau

22nd of February 2012

1Slide2

Outline

Motivation: need for interpretable modelsOverview of data analysis toolsModel evaluation – accuracy

vs complexityModel evaluation – understandabilityExample applicationsSummary

2Slide3

Example Application:Nuclear Threat Detection

Border control: vehicles are scannedHuman in the loop interpreting results

vehicle scan

prediction

feedback

3Slide4

Boosted Decision Stumps

Accurate, but hard to interpret

How is the prediction derived from the input?

4Slide5

Decision Tree – More Interpretable

Radiation > x%

Payload type = ceramicsUranium level > max. admissible for ceramics

Consider balance of Th232, Ra226 and Co60

Clear

yes

no

yes

no

Threat

yes

no

5Slide6

Motivation

6

Many users are willing to trade accuracy to

better understand the system-yielded results

Need

: simple, interpretable model

Need

: explanatory prediction processSlide7

Analysis Tools – Black-box

7Slide8

Analysis Tools – White-box

8Slide9

Explanation-Oriented Partitioning

2 Gaussians

Uniform cube

(X,Y) plot

9Slide10

EOP Execution Example – 3D data

Step 1: Select a projection - (X

1,X2)

10Slide11

Step 1: Select a projection - (X

1

,X2)

11

EOP Execution Example – 3D dataSlide12

Step 2: Choose a good classifier - call it h

1

h

1

12

EOP Execution Example – 3D dataSlide13

Step 2: Choose a good classifier - call it h

1

13

EOP Execution Example – 3D dataSlide14

Step 3: Estimate accuracy of h

1

at each point

OK

NOT OK

14

EOP Execution Example – 3D dataSlide15

Step 3: Estimate accuracy of h

1

for each point

15

EOP Execution Example – 3D dataSlide16

Step 4: Identify high accuracy regions

16

EOP Execution Example – 3D dataSlide17

Step 4: Identify high accuracy regions

17

EOP Execution Example – 3D dataSlide18

Step 5:Training points - removed from consideration

18

EOP Execution Example – 3D dataSlide19

19

Step 5:Training points - removed from consideration

EOP Execution Example – 3D dataSlide20

Finished first iteration

20

EOP Execution Example – 3D dataSlide21

21

EOP Execution Example – 3D data

Finished second iterationSlide22

Iterate until all data is accounted for

or error cannot be decreased

22

EOP Execution Example – 3D dataSlide23

Learned Model – Processing query [x

1x2x3]

[x1x2] in R1 ?[x2x

3

] in R

2

?

[x

1

x

3

] in R

3

?

h

1

(x

1

x

2

)

h

2

(x

2

x

3

)

h

3

(x

1

x

3

)

Default Value

yes

yes

yes

no

no

no

23Slide24

Parametric / Nonparametric Regions

Bounding

PolyhedraNearest-neighbor ScoreEnclose points in convex shapes (hyper-rectangles /spheres).Consider the k-nearest neighborsRegion: { X | Score(X) > t}t – learned threshold

Easy to

test inclusion

Easy to

test inclusion

Visually

appealing

Can look

insular

Inflexible

Deals

with irregularities

24

decision

p

n

1

n

2

n

3

n

4

n

5

Incorrectly classified

Correctly classified

Query point

decisionSlide25

EOP in context

Local models

Models trained on all featuresFeating

25

Similarities

Differences

CART

Decision structure

Default classifiers in leafs

Subspacing

Low-d projection

Keeps all data points

Boosting

Multiboosting

Committee decision

Gradually deals with difficult data

Ensemble learnerSlide26

Outline

Motivation: need for interpretable models

Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary

26Slide27

Overview of datasets

Real valued features, binary outputArtificial data – 10 featuresLow-d Gaussians/uniform cubes

UCI repositoryApplication-related datasetsResults by k-fold cross validationAccuracyComplexity = expected number of vector operations performed for a classification taskUnderstandability = w.r.t. acknowledged metrics

27Slide28

EOP vs

AdaBoost - SVM base classifiers

EOP is often less accurate, but not significantlythe reduction of complexity is statistically significant p-value of paired t-test: 0.832 p-value of paired t-test: 0.003

Boosting

EOP (nonparametric)

Accuracy

Complexity

28

mean diff in accuracy: 0.5%

mean diff in complexity: 85Slide29

EOP (stumps as base classifiers)

vs CART on data from the UCI repository

AccuracyComplexity

CART

EOP N.

EOP P.

29

Dataset

# of

Features

# of Points

Breast

Tissue

10

1006

Vowel

9

990

MiniBOONE

10

5000

Breast

Cancer

10

596

CART is the most accurate

Parametric EOP yields the simplest modelsSlide30

Typical XOR dataset

30

Why are EOP models less complex?Slide31

Typical XOR dataset

CART

is accurate takes many iterations does not uncover or leverage structure of data

31

Why are EOP models less complex?Slide32

Typical XOR dataset

EOP

equally accurate uncovers structure

Iteration 1

Iteration 2

32

CART

is accurate

takes many iterations

does not uncover or leverage structure of data

+

o

o

+

Why are EOP models less complex?Slide33

At low complexities, EOP is typically more accurate

Error Variation With Model Complexity for EOP and CART

Depth of decision tree/list

Error

33Slide34

At low complexities, EOP is typically more accurate

Error Variation With Model Complexity for EOP and CART

Depth of decision tree/listEOP_Error-CART_Error

34Slide35

UCI data – Accuracy

35

White box models – including EOP – do not lose muchSlide36

UCI data – Model complexity

36

Complexity of Random Forests is huge

- thousands of nodes -

White box models – especially EOP – are less complexSlide37

Robustness

Accuracy-targeting EOPIdentifies which portions of the data can be confidently classified with a given rate.Allowed to set aside the noisy part of the data.

37

Accuracy of AT-EOP on 3D data

Max allowed expected error

AccuracySlide38

Outline

Motivation: need for interpretable models

Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary

38Slide39

Metrics of Explainability*

39

*

L.Geng

and H. J. Hamilton - ‘Interestingness measures for data mining: A survey’Slide40

Evaluation with usefulness metrics

For 3 out of 4 metrics, EOP beats CART

CARTEOP

BF

L

J

NMI

BF

L

J

NMI

MB

1.982

0.004

0.389

0.040

1.889

0.007

0.201

0.502

BCW

1.057

0.007

0.004

0.011

2.204

0.069

0.150

0.635

BT

0.000

0.009

0.210

0.000

Inf

0.021

0.088

0.643

V

Inf

0.020

0.210

0.010

2.166

0.040

0.177

0.383

Mean

1.520

0.010

0.203

0.015

2.047

0.034

0.154

0.541

BF =

Bayes

Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info

40

Higher values are betterSlide41

Outline

Motivation: need for interpretable models

Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary

41Slide42

Spam Detection (UCI ‘SPAMBASE’)

10 features: frequencies of misc. words in e-mailsOutput: spam or not

42

ComplexitySlide43

Spam Detection – Iteration 1

classifier labels everything as spam

high confidence regions do enclose mostly spam and:Incidence of the word ‘your’ is lowLength of text in capital letters is high

43Slide44

Spam Detection – Iteration 2

the required incidence of capitals is increased

the square region on the left also encloses examples that will be marked as `not spam'

44Slide45

Spam Detection – Iteration 3

45

word_frequency_hi

Classifier marks everything as spam

Frequency of ‘our’ and ‘hi’ determine the regionsSlide46

Example Applications

46

Stem Cells

ICU Medication

Fuel Consumption

Nuclear Threats

explain relationships between treatment

and cell evolution

identify combinations of drugs that correlate with readmissions

identify causes of excess fuel consumption among driving behaviors

support interpretation of radioactivity detected in cargo containers

sparse features

class imbalance

- high-d data

- train/test from different distributions

adapted regression problemSlide47

Effects of Cell Treatment

Monitored population of cells7 features: cycle time, area, perimeter ...Task: determine which cells were treated

47

ComplexitySlide48

48Slide49

Mimic Medication Data

Information about administered medicationFeatures: dosage for each drugTask: predict patient return to ICU

49

ComplexitySlide50

50Slide51

Predicting Fuel Consumption

10 features: vehicle and driving style characteristicsOutput: fuel consumption level (high/low)

51

ComplexitySlide52

52Slide53

Nuclear threat detection data

325 FeaturesRandom Forests accuracy: 0.94Rectangular EOP accuracy: 0.881… but

Regions found in 1st iteration for Fold 0:incident.riidFeatures.SNR [2.90,9.2]Incident.riidFeatures.gammaDose [0,1.86]*10-8Regions found in 1st iteration for Fold 1:incident.rpmFeatures.gamma.sigma [2.5, 17.381]incident.rpmFeatures.gammaStatistics.skewdose [1.31,…]

No match

53Slide54

Feating and EOP

Decision Structures to pick right classification model

Flexible Regions

Tiles in feature space

Decision Tree

Decision List

Models trained on all features

Models trained on subspaces

EOP

Feating

54Slide55

Summary

In most cases EOP: maintains accuracy

reduces complexityidentifies useful aspects of the dataEOP typically wins in terms of expressivenessOpen questions:What if no good low-dimensional projections exist?What to do with inconsistent models in folds of CV?What is the best metric of explainability?

55