/
Feature Selection in Classification Feature Selection in Classification

Feature Selection in Classification - PowerPoint Presentation

debby-jeon
debby-jeon . @debby-jeon
Follow
378 views
Uploaded On 2018-10-07

Feature Selection in Classification - PPT Presentation

and R Packages Houtao Deng houtaodengintuitcom 1 Data Mining with R 12132011 Agenda Concept of feature selection Feature selection methods The R packages for feature selection 12132011 ID: 686155

selection feature mining data feature selection data mining rfe 2011 rrf random forest svm classifier cfs features parameter page

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Feature Selection in Classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Feature Selection in Classificationand R Packages

Houtao Denghoutao_deng@intuit.com

1

Data Mining with R

12/13/2011Slide2

AgendaConcept of feature selection

Feature selection methodsThe R packages for feature selection12/13/2011

Data Mining with R

2Slide3

The need of feature selectionAn illustrative example: online shopping prediction

3

Difficult to understand

Maybe only a small number of pages are needed, e.g.

pages related to books and placing orders

Features

(predictive variables, attributes)

Class

Customer

Page 1

Page 2

Page 3

….

Page

10,000

Buy a Book1131….1Yes2210….2Yes3200….0No…………………

Data Mining with R

12/13/2011Slide4

Feature selection4

Feature selection

Benefits

Easier to understand

Less

overfitting

Save time and space

Data Mining with R

12/13/2011

All

features

Feature

subset

Classifier

Applications

Genomic Analysis Text ClassificationMarketing AnalysisImage Classification…Accuracy is often used to evaluate the feature election method usedSlide5

Feature selection methodsUnivariate

Filter MethodsConsider one feature’s contribution to the class at a time, e.g. Information gain, chi-squareAdvantages

Computationally efficient and parallelable

Disadvantages

May select low quality feature subsets

12/13/2011

Data Mining with R

5Slide6

Feature selection methodsMultivariate Filter methods

Consider the contribution of a set of features to the class variable, e.g. CFS (correlation feature selection) [M Hall, 2000]FCBF

(fast correlation-based filter) [Lei Yu, etc. 2003]

Advantages: Computationally efficient

Select higher-quality feature subsets than

univariate

filters

Disadvantages:

Not optimized for a given classifier

12/13/2011

Data Mining with R

6Slide7

Feature selection methodsWrapper methods

Select a feature subset by building classifiers e.g. LASSO (least absolute shrinkage and selection operator) [R Tibshirani

, 1996]SVM-RFE

(SVM with recursive feature elimination) [I

Guyon

, etc. 2002]

RF-RFE

(random forest with recursive feature elimination) [

R

Uriarte

, etc. 2006

]

RRF

(regularized random forest) [H Deng, etc. 2011]Advantages: Select high-quality feature subsets for a particular classifier Disadvantages:

RFE methods are relatively computationally expensive. 12/13/2011

Data Mining with R7Slide8

Feature selection methodsSelect an appropriate wrapper method for a given classifier

8

Data Mining with R

12/13/2011

LASSO

Logistic Regression

RRF

RF-RFE

Tree models such as random forest, boosted trees, C4.5

SVM-RFE

SVM

Feature selection method

ClassifierSlide9

R packagesRweka

packageAn R Interface to WekaA large number of feature selection algorithms

Univariate

filters: information gain

,

chi-square

, etc.

Multivarite

filters:

CFS

,

etc.

Wrappers:

SVM-RFEFselector packageInherits a few feature selection methods from

Rweka.

12/13/2011Data Mining with R9Slide10

R packagesGlmnet

packageLASSO (least absolute shrinkage and selection operator)Main parameter: penalty parameter ‘lambda’RRF package

RRF (Regularized random forest)Main parameter: coefficient of regularization ‘

coefReg

varSelRF

package

RF-RFE (Random forest with recursive feature elimination)

Main parameter: number of iterations ‘

ntreeIterat

12/13/2011

Data Mining with R

10Slide11

ExamplesConsider LASSO, CFS (correlation features selection), RRF (regularized random forest), RF-RFE (random forest with RFE)

In all data sets, only 2 out of

100 features are needed for classification.

12/13/2011

Data Mining with R

11

Linear Separable

LASSO, CFS, RF-RFE, RRF

XOR

data

RRF, RF-RFE

Nonlinear

CFS, RF-RFE, RRF