/
Feature Selection Topics Feature Selection Topics

Feature Selection Topics - PowerPoint Presentation

serenemain
serenemain . @serenemain
Follow
347 views
Uploaded On 2020-06-15

Feature Selection Topics - PPT Presentation

Sergei V Gleyzer Data Science at the LHC Workshop Nov 9 2015 Outline Motivation What is Feature Selection Feature Selection Methods Recent work and ideas Caveats Nov 9 2015 ID: 777235

science data gleyzer lhc data science lhc gleyzer 2015sergei feature nov features selection classifier performance classifiers importance loss set

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Feature Selection Topics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Feature Selection Topics

Sergei V. Gleyzer

Data Science at the LHC WorkshopNov. 9, 2015

Slide2

Outline

MotivationWhat is Feature SelectionFeature Selection MethodsRecent work and ideasCaveatsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop

2

Slide3

MotivationCommon

Machine Learning (ML) problems in HEP:Classification or class discriminationHiggs event or background?Regression or function estimationHow to best model particle energy based on detector measurements

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop3

Slide4

Motivation continued

While performing data analysis one of the most crucial decisions is which features to use Garbage In = Garbage OutIngredients:Relevance to the problemLevel of understanding of the featurePower of the feature

and its relationship with othersNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop4

Slide5

Goal

How to:

Select Assess Improve Feature set used to solve the problemNov. 9, 2015

Sergei V. Gleyzer Data Science at LHC Workshop5

Slide6

ExampleB

uild a classifier to discriminate events of different classes based on event kinematicsTypical initial feature set:Functions of object four-vectors in eventBasic kinematics: transverse momenta, invariant masses, angular separations

More complex features relating objects in the event topology using physics knowledge to help discriminate among classes (thrusts, helicity e.t.c.)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop6

Slide7

Initial SelectionFeatures initially chosen

due to their individual performanceHow well does X discriminate between signal and background? Vetos: Is X

well-understood?Theoretical and other uncertaintiesMonte-Carlo and data agreementArrive at order of 10-30 features (95% use cases)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop7

Slide8

Feature EngineeringBy

combining features with each other, boosting into other frames of reference* this set can grow quickly from tens to hundreds of featuresThat’s ok if you have enough of computational power

Still small compared to 100k features of cancer/image recognition datasets Balance between Occam’s razor and need for additional performance/power * JHEP 1104:069,2011 K. Black, et. al.Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop8

Slide9

Practicum

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop9

*Feature Selection Bias✓✗✓

Slide10

MethodsFilters

WrappersEmbedded-Hybrid Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop

10

Slide11

Filters: usually fastNo

feedback from the ClassifierUse correlations/mutual information gain“quick and dirty” and less accurateUseful in pre-processing Example algorithms: information gain, Relief, rank-sum test, e.t.c.

Filter MethodsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop11

Slide12

Wrapper MethodsWrappers: typically slower and relatively more accurate (due to model-building)

Tied to a chosen model:Use it to evaluate featuresAssess feature interactionsSearch for optimal subset of featuresDifferent types: MethodicalProbabilistic (random hill-climbing)Heuristic (forward backward elimination)Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop12

Slide13

Feature Importance

proportional to classifier performance in which feature participates Full feature set {V}Feature subsets {S}Classifier performance F(S)Fast stochastic versionuses random subset seedsEx: Feature ImportanceNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop13

Slide14

Example: RuleFit

Rulefit: rule-based binary classification and regression (J. Friedman)Transforms decision trees into rule ensemblesA powerful classifier even if some rules are poorFeature Importance:Proportional to performance of classifiers in which features participate (similar)Difference: no Wi(S)Individual Classifier Performance evenly divided among participating featuresNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop

14

Slide15

Selection CaveatsFeature Selection BiasCommon mistake leading to

over-optimistic evaluation of classifiers from “usage” of the testing datasetSolution: M-fold cross-validation/Bootstrap Second testing sample for evaluation Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop15

Slide16

Feature Interactions

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop16

Features Like to Interact Too

Slide17

Feature InteractionsFeatures often interact

strongly in the classification process. Their removal affects the performance of remaining interacting partnersStrength of interaction quantified by some wrapper methodsIn some classifiers features can be overlooked (or shadowed) by their interacting partnersNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop17

Beware of hidden reefs

Slide18

Selection

CaveatsNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop18

AfterRemoveBefore

Importance Landscape Has Changed

Holds for any criterion that doesn’t incorporate interactions

Slide19

Global Loss FunctionGLoss

Function Global measure of loss Selects feature subsets for global removalShows the amount of predictive power loss relative to the upper bound of performance of remaining classifiers

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop19S’ is the subset to be removed

Slide20

Global Loss and Classifier Performance

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop20

GLoss Function minimization NOT EQUIVALENT to Maximization of F(S) – i.e. finding the highest performing classifier and its constituent features

Slide21

Recent WorkProbabilistic Wrapper Methods:Stochastic approach

Hybrid Algorithms: combine Filters and Wrappers Embedded Methods Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop21

Feature Selection during Model Building

Slide22

Embedded MethodsAt model-building stage assess feature importance

and incorporate it in the process Way to penalize/remove features in the classification or regression processRegularizationExamples: LASSO, RegTreesNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop22

Slide23

Regularized TreesInspired by Rules Regularization

in Friedman and Popescu 2008 Decision Tree Reminder:Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop23

“Votes” taken at decision junctions on possible splits among the features RegTrees penalize during voting features similar to those used in previous decisionsEnd up with a high quality feature set classifier

Slide24

Feature AmplificationAnother example: feedback feature importance into classifier building

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop24

Weight votes by log(feature importance)

Slide25

Decade Ahead

Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop25H. Prosper

Slide26

Minimal SUSYNov. 9, 2015

Sergei V. Gleyzer Data Science at LHC Workshop26

Slide27

In HEPOften in HEP one searches for new phenomena and applies classifiers trained on MC for at least one of the classes (signal) or sometimes both to real data

Flexibility is KEY to any searchIt is more beneficial to choose a reduced parameter space that consistently produces strong performing classifiers at actual analysis time Useful for general SUSY and other new phenomena searches Nov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop27

Slide28

Feature Selection ToolsR (CRAN): Boruta

, RFE, CFS, Fselector, caretTMVA: FAST algo (stochastic wrapper), Global Loss functionScikit-LearnBioconductorNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop28

Slide29

SummaryFeature selection is important part of robust HEP Machine Learning applications

Many methods availableWatch out for caveatsHappy ANALYZINGNov. 9, 2015Sergei V. Gleyzer Data Science at LHC Workshop29