Model Learning DCAP Meeting Madalina Fiterau 22 nd of February 2012 1 Outline Motivation need for interpretable models Overview of data analysis tools Model evaluation accuracy ID: 733555
Download Presentation The PPT/PDF document "Trade-offs in Explanatory" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Trade-offs in Explanatory Model Learning
DCAP MeetingMadalina Fiterau
22nd of February 2012
1Slide2
Outline
Motivation: need for interpretable modelsOverview of data analysis toolsModel evaluation – accuracy
vs complexityModel evaluation – understandabilityExample applicationsSummary
2Slide3
Example Application:Nuclear Threat Detection
Border control: vehicles are scannedHuman in the loop interpreting results
vehicle scan
prediction
feedback
3Slide4
Boosted Decision Stumps
Accurate, but hard to interpret
How is the prediction derived from the input?
4Slide5
Decision Tree – More Interpretable
Radiation > x%
Payload type = ceramicsUranium level > max. admissible for ceramics
Consider balance of Th232, Ra226 and Co60
Clear
yes
no
yes
no
Threat
yes
no
5Slide6
Motivation
6
Many users are willing to trade accuracy to
better understand the system-yielded results
Need
: simple, interpretable model
Need
: explanatory prediction processSlide7
Analysis Tools – Black-box
7Slide8
Analysis Tools – White-box
8Slide9
Explanation-Oriented Partitioning
2 Gaussians
Uniform cube
(X,Y) plot
9Slide10
EOP Execution Example – 3D data
Step 1: Select a projection - (X
1,X2)
10Slide11
Step 1: Select a projection - (X
1
,X2)
11
EOP Execution Example – 3D dataSlide12
Step 2: Choose a good classifier - call it h
1
h
1
12
EOP Execution Example – 3D dataSlide13
Step 2: Choose a good classifier - call it h
1
13
EOP Execution Example – 3D dataSlide14
Step 3: Estimate accuracy of h
1
at each point
OK
NOT OK
14
EOP Execution Example – 3D dataSlide15
Step 3: Estimate accuracy of h
1
for each point
15
EOP Execution Example – 3D dataSlide16
Step 4: Identify high accuracy regions
16
EOP Execution Example – 3D dataSlide17
Step 4: Identify high accuracy regions
17
EOP Execution Example – 3D dataSlide18
Step 5:Training points - removed from consideration
18
EOP Execution Example – 3D dataSlide19
19
Step 5:Training points - removed from consideration
EOP Execution Example – 3D dataSlide20
Finished first iteration
20
EOP Execution Example – 3D dataSlide21
21
EOP Execution Example – 3D data
Finished second iterationSlide22
Iterate until all data is accounted for
or error cannot be decreased
22
EOP Execution Example – 3D dataSlide23
Learned Model – Processing query [x
1x2x3]
[x1x2] in R1 ?[x2x
3
] in R
2
?
[x
1
x
3
] in R
3
?
h
1
(x
1
x
2
)
h
2
(x
2
x
3
)
h
3
(x
1
x
3
)
Default Value
yes
yes
yes
no
no
no
23Slide24
Parametric / Nonparametric Regions
Bounding
PolyhedraNearest-neighbor ScoreEnclose points in convex shapes (hyper-rectangles /spheres).Consider the k-nearest neighborsRegion: { X | Score(X) > t}t – learned threshold
Easy to
test inclusion
Easy to
test inclusion
Visually
appealing
Can look
insular
Inflexible
Deals
with irregularities
24
decision
p
n
1
n
2
n
3
n
4
n
5
Incorrectly classified
Correctly classified
Query point
decisionSlide25
EOP in context
Local models
Models trained on all featuresFeating
25
Similarities
Differences
CART
Decision structure
Default classifiers in leafs
Subspacing
Low-d projection
Keeps all data points
Boosting
Multiboosting
Committee decision
Gradually deals with difficult data
Ensemble learnerSlide26
Outline
Motivation: need for interpretable models
Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary
26Slide27
Overview of datasets
Real valued features, binary outputArtificial data – 10 featuresLow-d Gaussians/uniform cubes
UCI repositoryApplication-related datasetsResults by k-fold cross validationAccuracyComplexity = expected number of vector operations performed for a classification taskUnderstandability = w.r.t. acknowledged metrics
27Slide28
EOP vs
AdaBoost - SVM base classifiers
EOP is often less accurate, but not significantlythe reduction of complexity is statistically significant p-value of paired t-test: 0.832 p-value of paired t-test: 0.003
Boosting
EOP (nonparametric)
Accuracy
Complexity
28
mean diff in accuracy: 0.5%
mean diff in complexity: 85Slide29
EOP (stumps as base classifiers)
vs CART on data from the UCI repository
AccuracyComplexity
CART
EOP N.
EOP P.
29
Dataset
# of
Features
# of Points
Breast
Tissue
10
1006
Vowel
9
990
MiniBOONE
10
5000
Breast
Cancer
10
596
CART is the most accurate
Parametric EOP yields the simplest modelsSlide30
Typical XOR dataset
30
Why are EOP models less complex?Slide31
Typical XOR dataset
CART
is accurate takes many iterations does not uncover or leverage structure of data
31
Why are EOP models less complex?Slide32
Typical XOR dataset
EOP
equally accurate uncovers structure
Iteration 1
Iteration 2
32
CART
is accurate
takes many iterations
does not uncover or leverage structure of data
+
o
o
+
Why are EOP models less complex?Slide33
At low complexities, EOP is typically more accurate
Error Variation With Model Complexity for EOP and CART
Depth of decision tree/list
Error
33Slide34
At low complexities, EOP is typically more accurate
Error Variation With Model Complexity for EOP and CART
Depth of decision tree/listEOP_Error-CART_Error
34Slide35
UCI data – Accuracy
35
White box models – including EOP – do not lose muchSlide36
UCI data – Model complexity
36
Complexity of Random Forests is huge
- thousands of nodes -
White box models – especially EOP – are less complexSlide37
Robustness
Accuracy-targeting EOPIdentifies which portions of the data can be confidently classified with a given rate.Allowed to set aside the noisy part of the data.
37
Accuracy of AT-EOP on 3D data
Max allowed expected error
AccuracySlide38
Outline
Motivation: need for interpretable models
Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary
38Slide39
Metrics of Explainability*
39
*
L.Geng
and H. J. Hamilton - ‘Interestingness measures for data mining: A survey’Slide40
Evaluation with usefulness metrics
For 3 out of 4 metrics, EOP beats CART
CARTEOP
BF
L
J
NMI
BF
L
J
NMI
MB
1.982
0.004
0.389
0.040
1.889
0.007
0.201
0.502
BCW
1.057
0.007
0.004
0.011
2.204
0.069
0.150
0.635
BT
0.000
0.009
0.210
0.000
Inf
0.021
0.088
0.643
V
Inf
0.020
0.210
0.010
2.166
0.040
0.177
0.383
Mean
1.520
0.010
0.203
0.015
2.047
0.034
0.154
0.541
BF =
Bayes
Factor. L = Lift. J = J-score. NMI = Normalized Mutual Info
40
Higher values are betterSlide41
Outline
Motivation: need for interpretable models
Overview of data analysis toolsModel evaluation – accuracy vs complexityModel evaluation – understandabilityExample applicationsSummary
41Slide42
Spam Detection (UCI ‘SPAMBASE’)
10 features: frequencies of misc. words in e-mailsOutput: spam or not
42
ComplexitySlide43
Spam Detection – Iteration 1
classifier labels everything as spam
high confidence regions do enclose mostly spam and:Incidence of the word ‘your’ is lowLength of text in capital letters is high
43Slide44
Spam Detection – Iteration 2
the required incidence of capitals is increased
the square region on the left also encloses examples that will be marked as `not spam'
44Slide45
Spam Detection – Iteration 3
45
word_frequency_hi
Classifier marks everything as spam
Frequency of ‘our’ and ‘hi’ determine the regionsSlide46
Example Applications
46
Stem Cells
ICU Medication
Fuel Consumption
Nuclear Threats
explain relationships between treatment
and cell evolution
identify combinations of drugs that correlate with readmissions
identify causes of excess fuel consumption among driving behaviors
support interpretation of radioactivity detected in cargo containers
sparse features
class imbalance
- high-d data
- train/test from different distributions
adapted regression problemSlide47
Effects of Cell Treatment
Monitored population of cells7 features: cycle time, area, perimeter ...Task: determine which cells were treated
47
ComplexitySlide48
48Slide49
Mimic Medication Data
Information about administered medicationFeatures: dosage for each drugTask: predict patient return to ICU
49
ComplexitySlide50
50Slide51
Predicting Fuel Consumption
10 features: vehicle and driving style characteristicsOutput: fuel consumption level (high/low)
51
ComplexitySlide52
52Slide53
Nuclear threat detection data
325 FeaturesRandom Forests accuracy: 0.94Rectangular EOP accuracy: 0.881… but
Regions found in 1st iteration for Fold 0:incident.riidFeatures.SNR [2.90,9.2]Incident.riidFeatures.gammaDose [0,1.86]*10-8Regions found in 1st iteration for Fold 1:incident.rpmFeatures.gamma.sigma [2.5, 17.381]incident.rpmFeatures.gammaStatistics.skewdose [1.31,…]
No match
53Slide54
Feating and EOP
Decision Structures to pick right classification model
Flexible Regions
Tiles in feature space
Decision Tree
Decision List
Models trained on all features
Models trained on subspaces
EOP
Feating
54Slide55
Summary
In most cases EOP: maintains accuracy
reduces complexityidentifies useful aspects of the dataEOP typically wins in terms of expressivenessOpen questions:What if no good low-dimensional projections exist?What to do with inconsistent models in folds of CV?What is the best metric of explainability?
55