Sergei V Gleyzer Data Science at the LHC Workshop Nov 9 2015 Outline Motivation What is Feature Selection Feature Selection Methods Recent work and ideas Caveats Nov 9 2015 ID: 777235
Download The PPT/PDF document "Feature Selection Topics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Feature Selection Topics
Sergei V. Gleyzer
Data Science at the LHC WorkshopNov. 9, 2015
Slide2Outline
MotivationWhat is Feature SelectionFeature Selection MethodsRecent work and ideasCaveatsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop
2
Slide3MotivationCommon
Machine Learning (ML) problems in HEP:Classification or class discriminationHiggs event or background?Regression or function estimationHow to best model particle energy based on detector measurements
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop3
Slide4Motivation continued
While performing data analysis one of the most crucial decisions is which features to use Garbage In = Garbage OutIngredients:Relevance to the problemLevel of understanding of the featurePower of the feature
and its relationship with othersNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop4
Slide5Goal
How to:
Select Assess Improve Feature set used to solve the problemNov. 9, 2015
Sergei V. Gleyzer Data Science at LHC Workshop5
Slide6ExampleB
uild a classifier to discriminate events of different classes based on event kinematicsTypical initial feature set:Functions of object four-vectors in eventBasic kinematics: transverse momenta, invariant masses, angular separations
More complex features relating objects in the event topology using physics knowledge to help discriminate among classes (thrusts, helicity e.t.c.)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop6
Slide7Initial SelectionFeatures initially chosen
due to their individual performanceHow well does X discriminate between signal and background? Vetos: Is X
well-understood?Theoretical and other uncertaintiesMonte-Carlo and data agreementArrive at order of 10-30 features (95% use cases)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop7
Slide8Feature EngineeringBy
combining features with each other, boosting into other frames of reference* this set can grow quickly from tens to hundreds of featuresThat’s ok if you have enough of computational power
Still small compared to 100k features of cancer/image recognition datasets Balance between Occam’s razor and need for additional performance/power * JHEP 1104:069,2011 K. Black, et. al.Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop8
Slide9Practicum
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop9
*Feature Selection Bias✓✗✓
Slide10MethodsFilters
WrappersEmbedded-Hybrid Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop
10
Slide11Filters: usually fastNo
feedback from the ClassifierUse correlations/mutual information gain“quick and dirty” and less accurateUseful in pre-processing Example algorithms: information gain, Relief, rank-sum test, e.t.c.
Filter MethodsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop11
Slide12Wrapper MethodsWrappers: typically slower and relatively more accurate (due to model-building)
Tied to a chosen model:Use it to evaluate featuresAssess feature interactionsSearch for optimal subset of featuresDifferent types: MethodicalProbabilistic (random hill-climbing)Heuristic (forward backward elimination)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop12
Slide13Feature Importance
proportional to classifier performance in which feature participates Full feature set {V}Feature subsets {S}Classifier performance F(S)Fast stochastic versionuses random subset seedsEx: Feature ImportanceNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop13
Slide14Example: RuleFit
Rulefit: rule-based binary classification and regression (J. Friedman)Transforms decision trees into rule ensemblesA powerful classifier even if some rules are poorFeature Importance:Proportional to performance of classifiers in which features participate (similar)Difference: no Wi(S)Individual Classifier Performance evenly divided among participating featuresNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop
14
Slide15Selection CaveatsFeature Selection BiasCommon mistake leading to
over-optimistic evaluation of classifiers from “usage” of the testing datasetSolution: M-fold cross-validation/Bootstrap Second testing sample for evaluation Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop15
Slide16Feature Interactions
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop16
Features Like to Interact Too
Feature InteractionsFeatures often interact
strongly in the classification process. Their removal affects the performance of remaining interacting partnersStrength of interaction quantified by some wrapper methodsIn some classifiers features can be overlooked (or shadowed) by their interacting partnersNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop17
Beware of hidden reefs
Slide18Selection
CaveatsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop18
AfterRemoveBefore
Importance Landscape Has Changed
Holds for any criterion that doesn’t incorporate interactions
Slide19Global Loss FunctionGLoss
Function Global measure of loss Selects feature subsets for global removalShows the amount of predictive power loss relative to the upper bound of performance of remaining classifiers
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop19S’ is the subset to be removed
Slide20Global Loss and Classifier Performance
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop20
GLoss Function minimization NOT EQUIVALENT to Maximization of F(S) – i.e. finding the highest performing classifier and its constituent features
Slide21Recent WorkProbabilistic Wrapper Methods:Stochastic approach
Hybrid Algorithms: combine Filters and Wrappers Embedded Methods Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop21
Feature Selection during Model Building
Slide22Embedded MethodsAt model-building stage assess feature importance
and incorporate it in the process Way to penalize/remove features in the classification or regression processRegularizationExamples: LASSO, RegTreesNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop22
Slide23Regularized TreesInspired by Rules Regularization
in Friedman and Popescu 2008 Decision Tree Reminder:Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop23
“Votes” taken at decision junctions on possible splits among the features RegTrees penalize during voting features similar to those used in previous decisionsEnd up with a high quality feature set classifier
Slide24Feature AmplificationAnother example: feedback feature importance into classifier building
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop24
Weight votes by log(feature importance)
Slide25Decade Ahead
Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop25H. Prosper
Slide26Minimal SUSYNov. 9, 2015
Sergei V. Gleyzer Data Science at LHC Workshop26
Slide27In HEPOften in HEP one searches for new phenomena and applies classifiers trained on MC for at least one of the classes (signal) or sometimes both to real data
Flexibility is KEY to any searchIt is more beneficial to choose a reduced parameter space that consistently produces strong performing classifiers at actual analysis time Useful for general SUSY and other new phenomena searches Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop27
Slide28Feature Selection ToolsR (CRAN): Boruta
, RFE, CFS, Fselector, caretTMVA: FAST algo (stochastic wrapper), Global Loss functionScikit-LearnBioconductorNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop28
Slide29SummaryFeature selection is important part of robust HEP Machine Learning applications
Many methods availableWatch out for caveatsHappy ANALYZINGNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop29