/
PackageFADADecember102019TypePackageTitleVariableSelectionforSupervise PackageFADADecember102019TypePackageTitleVariableSelectionforSupervise

PackageFADADecember102019TypePackageTitleVariableSelectionforSupervise - PDF document

amelia
amelia . @amelia
Follow
343 views
Uploaded On 2021-08-24

PackageFADADecember102019TypePackageTitleVariableSelectionforSupervise - PPT Presentation

2FADApackagedecorrelatetrain6FADA8Index11FADApackageVariableselectionforsupervisedclassi2cationinhighdimensionDescriptionThefunctionsprovidedintheFADAFactorAdjustedDiscriminantAnalysispackageaimatpe ID: 870386

data train sda test train data test sda decorrelate method fada 2010 friguet andcauseur res lfdr france hastie res2

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "PackageFADADecember102019TypePackageTitl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 Package`FADA'December10,2019TypePackageT
Package`FADA'December10,2019TypePackageTitleVariableSelectionforSupervisedClassicationinHighDimensionVersion1.3.5Date2019-12-10AuthorEmelinePerthame(InstitutPasteur,Paris,France),ChloeFriguet(UniversitedeBretagneSud,Vannes,France)andDavidCauseur(AgrocampusOuest,Rennes,France)MaintainerDavidCauseurÚvi; .ca;&#xuseu;&#xr@ag;&#xroca;&#xmpus;&#x-oue;&#xst.f;&#xr000;DescriptionThefunctionsprovidedintheFADA(FactorAdjustedDiscriminantAnalysis)pack-ageaimatperformingsupervisedclassicationofhigh-dimensionalandcorrelatedpro-les.Theprocedurecombinesadecorrelationstepbasedonafactormodelingofthedependenceamongcovariatesandaclassicationmethod.Theavail-ablemethodsareLassoregularizedlogisticmodel(seeFriedmanetal.(2010)),sparselineardiscriminantanalysis(seeClemmensenetal.(2011)),shrinkagelinearanddiagonaldiscriminantanalysis(seeM.Ahdesmakietal.(2010)).Moremethodsofclassica-tioncanbeusedonthedecorrelateddataprovidedbythepackageFADA.LicenseGPLÚvi; .ca;&#xuseu;&#xr@ag;&#xroca;&#xmpus;&#x-oue;&#xst.f;&#xr000;(=2)DependsMASS,elasticnetImportssparseLDA,sda,glmnet,mnormt,crossval,corpcor,matrixStats,methodsNeedsCompilationnoRepositoryCRANDate/Publication2019-12-1015:30:05UTCRtopicsdocumented:FADA-package.......................................2data.test...........................................4data.train........

2 ..................................4decor
..................................4decorrelate.test.......................................51 2FADA-packagedecorrelate.train.......................................6FADA............................................8Index11 FADA-packageVariableselectionforsupervisedclassicationinhighdimension DescriptionThefunctionsprovidedintheFADA(FactorAdjustedDiscriminantAnalysis)packageaimatper-formingsupervisedclassicationofhigh-dimensionalandcorrelatedproles.Theprocedurecom-binesadecorrelationstepbasedonafactormodelingofthedependenceamongcovariatesandaclassicationmethod.TheavailablemethodsareLassoregularizedlogisticmodel(seeFriedmanetal.(2010)),sparselineardiscriminantanalysis(seeClemmensenetal.(2011)),shrinkagelinearanddiagonaldiscriminantanalysis(seeM.Ahdesmakietal.(2010)).MoremethodsofclassicationcanbeusedonthedecorrelateddataprovidedbythepackageFADA.DetailsPackage:FADAType:PackageVersion:1.2Date:2014-10-08License:GPL�(=2)Thefunctionsavailableinthispackageareusedinthisorder:•Step1:Decorrelationofthetrainingdatasetusingafactormodelofthecovariancebythedecorrelate.trainfunction.Thenumberoffactorsofthemodelcanbeestimatedorforced.•Step2:Ifneeded,decorrelationofthetestingdatasetbyusingthedecorrelate.testfunc-tionandtheestimatedfactormodelparametersprovidedbydecorrelate.train.•Step3:Estimationofasu

3 pervisedclassicationmodelusingthedec
pervisedclassicationmodelusingthedecorrelatedtrainingdatasetbytheFADAfunction.Onecanchooseamongseveralclassicationmethods(moredetailsinthemanualofFADAfunction).•Step4:Ifneeded,computationoftheerrorratebytheFADAfunction,eitherusingasupple-mentarytestdatasetorbyK-foldcross-validation.Author(s)EmelinePerthame(AgrocampusOuest,Rennes,France),ChloeFriguet(UniversitedeBretagneSud,Vannes,France)andDavidCauseur(AgrocampusOuest,Rennes,France)Maintainer:DavidCauseur,http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/david.causeur,mailto:david.causeur@agrocampus-ouest.fr FADA-package3ReferencesAhdesmaki,M.andStrimmer,K.(2010),Featureselectioninomicspredictionproblemsusingcatscoresandfalsenon-discoveryratecontrol.AnnalsofAppliedStatistics,4,503-519.Clemmensen,L.,Hastie,T.andWitten,D.andErsboll,B.(2011),Sparsediscriminantanalysis.Technometrics,53(4),406-413.Friedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.Examples###N

4 otrun###exampleofanentireanalysiswithFAD
otrun###exampleofanentireanalysiswithFADApackageifatestingdatasetisavailable###loadingdata#data(data.train)#data(data.test)#dim(data.train$x)#30250#dim(data.test$x)#1000250###decorrelationofthetrainingdataset#res=decorrelate.train(data.train)#Optimalnumberoffactorsis3###decorrelationofthetestingdatasetafterward#res2=decorrelate.test(res,data.test)###classificationstepwithsda,usinglocalfalsediscoveryrateforvariableselection###lineardiscriminantanalysis#FADA.LDA=FADA(res2,method="sda",sda.method="lfdr")###diagonaldiscriminantanalysis#FADA.DDA=FADA(res2,method="sda",sda.method="lfdr",diagonal=TRUE)###exampleofanentireanalysiswithFADApackageifnotestingdatasetisavailable###loadingdata###decorrelationstep#res=decorrelate.train(data.train)#Optimalnumberoffactorsis3###classificationstepwithsda,usinglocalfalsediscoveryrateforvariableselection###lineardiscriminantanalysis,errorrateiscomputedby10-foldCV(20replicationsoftheCV)#FADA.LDA=FADA(res,method="sda",sda.method="lfdr") 4data.train data.testTestdatasetsimulatedwiththesamedistributionasthetrainingdatasetdata.train. DescriptionThetestdatasethasthesameliststructureasthetrainingdatasetdta.Onlythenumbersofrowsofthexcomponentandlengthoftheycomponentaredifferentsincethetestsamplesizeis1000.Usagedata(data.test)FormatListwith2components:x,the1000x250matrixofsimulatedexplanatoryvariables

5 andy,the1000x1groupingvariable(coded1and
andy,the1000x1groupingvariable(coded1and2).Examplesdata(data.test)dim(data.test$x)#1000250data.test$y#2levels data.trainTrainingdata DescriptionSimulatedtrainingdataset.Thexcomponentisamatrixofexplanatoryvariables,with30rowsand250columns.Eachrowissimulatedaccordingtoamultinormaldistributionwhichmeandependsonagroupmembershipgivenbytheycomponent.Thevariancematrixisthesamewithineachgroup.Usagedata(data.train)FormatAlistwith2components.xisa30x250matrixofsimulatedexplanatoryvariables.yisa30x1groupingvariable(coded1and2). decorrelate.test5Examplesdata(data.train)dim(data.train$x)#30250data.train$y#2levelshist(cor(data.train$x[data.train$y==1,]))#highdependencehist(cor(data.train$x[data.train$y==2,])) decorrelate.testFactorAdjustedDiscriminantAnalysis2:Decorrelationofatestingdatasetafterrunningthedecorrelate.trainfunctiononatrainingdataset DescriptionThisfunctiondecorrelatesthetestdatasetbyadjustingdatafortheeffectsoflatentfactorsofde-pendence,afterrunningthedecorrelate.trainfunctiononatrainingdataset.Usagedecorrelate.test(faobject,data.test)ArgumentsfaobjectAnobjectreturnedbyfunctiondecorrelate.train.data.testAlistcontainingthetestingdataset,withthefollowingcomponent:xisanxpmatrixofexplanatoryvariables,wherenstandsforthetestingsamplesizeandpforthenumberofexplanatoryvariables.ValueReturnsalistwiththefollowingelements:meanclassGr

6 oupmeansestimatedafteriterativedecorrela
oupmeansestimatedafteriterativedecorrelationfa.trainingDecorrelatedtrainingdatafa.testingDecorrelatedtestingdataPsiEstimationofthefactormodelparameters:specicvarianceBEstimationofthefactormodelparameters:loadingsfactors.trainingScoresofthetrainingsindividualsonthefactorsfactors.testingScoresofthetestingindividualsonthefactorsgroupsRecallofgroupvariableoftrainingdataproba.trainingInternalvalue(estimationofindividualprobabilitiesforthetrainingdataset)proba.testingInternalvalue(estimationofindividualprobabilitiesforthetestingdataset)mod.decorrelate.testInternalvalue(classicationmodel) 6decorrelate.trainAuthor(s)EmelinePerthame,ChloeFriguetandDavidCauseurReferencesFriedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.SeeAlsoFADA-packageFADAglmnet-packageExamplesdata(data.train)data(data.test)fa=decorrelate.train(data.train)fa2=decorrelate.test(fa,data.test)names(fa2) decorrelate.trainFactorAdjustedDiscriminantAnalysis1:Decorre

7 lationofthetrain-ingdata DescriptionThis
lationofthetrain-ingdata DescriptionThisfunctiondecorrelatesthetrainingdatasetbyadjustingdatafortheeffectsoflatentfactorsofdependence.Usagedecorrelate.train(data.train,nbf=NULL,maxnbfactors=12,diagnostic.plot=FALSE,min.err=0.001,verbose=TRUE,EM=TRUE,maxiter=15,...)Argumentsdata.trainAlistcontainingthetrainingdatasetwiththefollowingcomponents:xisthenxpmatrixofexplanatoryvariables,wherenstandsforthetrainingsamplesizeandpforthenumberofexplanatoryvariables;yisanumericvectorgivingthegroupofeachindividualnumberedfrom1toK.nbfNumberoffactors.Ifnbf=NULL,thenumberoffactorsisestimated.nbfcanalsobesettoapositiveintegervalue.Ifnbf=0,thedataarenotfactor-adjusted. decorrelate.train7maxnbfactorsThemaximumnumberoffactors.Defaultismaxnbfactors=12.diagnostic.plotIfdiagnostic.plot=TRUE,thevaluesofthevarianceinationcriterionareplottedforeachnumberoffactors.Defaultisdiagnostic.plot=FALSE.Thisoptionmightbehelpfultomanuallydeterminetheoptimalnumberoffactors.min.errThresholdofconvergenceofthealgorithmcriterion.Defaultismin.err=0.001.verbosePrintoutnumberoffactorsandvaluesoftheobjectivecriterionalongtheitera-tions.DefaultisTRUE.EMThemethodusedtoestimatetheparametersofthefactormodel.IfEM=TRUE,pa-rametersareestimatedbyanEMalgorithm.SettingEM=TRUEisrecommendedwhenthenumberofcovariatesexceedsthenumberofobservations.IfEM=FALSE,theparametersareestim

8 atedbymaximum-likelihoodusingfactanal.De
atedbymaximum-likelihoodusingfactanal.DefaultisEM=TRUEmaxiterMaximumnumberofiterationsforestimationofthefactormodel....Otherargumentsthatcanbepassedinthecv.glmnetandglmnetfunctionsfromglmnetpackage.Thesefunctionsareusedtoestimateindividualgroupprobabilities.Modifyingtheseparametersshouldnotaffectthedecorrelationprocedure.However,theargumentnfoldsincv.glmnetissetto10bydefaultandshouldbereduced(minimum3)forlargedatasets,inordertodecreasethecomputationtimeofdecorrelation.train.ValueReturnsalistwiththefollowingelements:meanclassGroupmeansestimatedafteriterativedecorrelationfa.trainingDecorrelatedtrainingdataPsiEstimationofthefactormodelparameters:specicvarianceBEstimationofthefactormodelparameters:loadingsfactors.trainingScoresofthetrainingsindividualsonthefactorsgroupsRecallofgroupvariableoftrainingdataproba.trainingInternalvalue(estimationofindividualprobabilitiesforthetrainingdataset)Author(s)EmelinePerthame,ChloeFriguetandDavidCauseurReferencesFriedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassi&

9 #2;cationissuesforhigh-dimensionalcorrel
#2;cationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing. 8FADASeeAlsoFADA-packageFADAglmnet-packagefactanalExamplesdata(data.train)res0=decorrelate.train(data.train,nbf=3)#whenthenumberoffactorsisforcedres1=decorrelate.train(data.train)#whentheoptimalnumberoffactorsisunknown FADAFactorAdjustedDiscriminantAnalysis3-4:Supervisedclassicationondecorrelateddata DescriptionThisfunctionperformssupervisedclassicationonfactor-adjusteddata.UsageFADA(faobject,K=10,B=20,nbf.cv=NULL,method=c("glmnet","sda","sparseLDA"),sda.method=c("lfdr","HC"),alpha=0.1,...)ArgumentsfaobjectAnobjectreturnedbyfunctiondecorrelate.trainordecorrelate.test.KNumberoffoldstoestimateclassicationerrorrate,onlywhennotestingdataisprovided.DefaultisK=10.BNumberofreplicationsofthecross-validation.DefaultisB=20.nbf.cvNumberoffactorsforcrossvalidationtocomputeerrorrate,onlywhennotest-ingdataisprovided.Bydefault,nbf=NULLandthenumberoffactorsises-timatedforeachfoldofthecrossvalidation.nbfcanalsobesettoapositiveintegervalue.Ifnbf=0,thedataarenotfactor-adjusted.methodThemethodusedtoperformsupervisedclassicationmodel.3optionsareavail-able.Ifmethod="glmnet",aLassopenalizedlogisticregressionisperformedusingglmnetRpackage.Ifmethod="sda",aLDAorDDA(seediagonalargument)isperformedusingShrinkageDiscriminantAnalysisusingsdaRpackage.Ifmethod="spar

10 seLDA",aLassopenalizedLDAisperformedus-i
seLDA",aLassopenalizedLDAisperformedus-ingSparseLDARpackage.sda.methodThemethodusedforvariableselection,onlyifmethod="sda".Ifsda.method="lfdr",variablesareselectedthroughCATscoresandFalseNonDiscoveryRatecon-trol.Ifsda.method="HC",thevariableselectionmethodisHigherCristicismThresholding.alphaTheproportionoftheHCobjectivetobeobserved,onlyifmethod="sda"andsda.method="HC".Defaultis0.1. FADA9...Someargumentstotunetheclassicationmethod.Seethedocumentationofthechosenmethod(glmnet,sdaorsda)formoreinformationsabouttheseparame-ters.ValueReturnsalistwiththefollowingelements:methodRecalloftheclassicationmethodselectedAvectorcontainingindexoftheselectedvariablesproba.trainAmatrixcontainingpredictedgroupfrequenciesoftrainingdata.proba.testAmatrixcontainingpredictedgroupfrequenciesoftestingdata,ifatestingdatasethasbeenprovidedpredict.testAmatrixcontainingpredictedclassesoftestingdata,ifatestingdatasethasbeenprovidedcv.errorAnumericvaluecontainingtheaverageclassicationerror,computedbycrossvalidation,ifnotestingdatasethasbeenprovidedcv.error.seAnumericvaluecontainingthestandarderroroftheclassicationerror,com-putedbycrossvalidation,ifnotestingdatasethasbeenprovidedmodTheclassicationmodelperformed.Theclassofthiselementistheclassofamodelreturnedbythechosenmethod.Seethedocumentationofthechosenmethodformoredetails.Author(s)

11 EmelinePerthame,ChloeFriguetandDavidCaus
EmelinePerthame,ChloeFriguetandDavidCauseurReferencesAhdesmaki,M.andStrimmer,K.(2010),Featureselectioninomicspredictionproblemsusingcatscoresandfalsenon-discoveryratecontrol.AnnalsofAppliedStatistics,4,503-519.Clemmensen,L.,Hastie,T.andWitten,D.andErsboll,B.(2011),Sparsediscriminantanalysis.Technometrics,53(4),406-413.Friedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.SeeAlsoFADA,decorrelate.train,decorrelate.test,sda,sda-package,glmnet-package 10FADAExamplesdata(data.train)data(data.test)#Whentestingdatasetisprovidedres=decorrelate.train(data.train)res2=decorrelate.test(res,data.test)classif=FADA(res2,method="sda",sda.method="lfdr")###Notrun#Whennotestingdatasetisprovided#ClassificationerrorrateiscomputedbyaK-foldcrossvalidation.#res=decorrelate.train(data.train)#classif=FADA(res,method="sda",sda.method="lfdr") Indexdata.test,4data.train,4decorrelate.test,5,9decorrelate.train,6,9factanal,8FADA,6,8,8,9FADA-package,2glmnet,9sda,

Related Contents


Next Show more