2FADApackagedecorrelatetrain6FADA8Index11FADApackageVariableselectionforsupervisedclassi2cationinhighdimensionDescriptionThefunctionsprovidedintheFADAFactorAdjustedDiscriminantAnalysispackageaimatpe ID: 870386
Download Pdf The PPT/PDF document "PackageFADADecember102019TypePackageTitl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 Package`FADA'December10,2019TypePackageT
Package`FADA'December10,2019TypePackageTitleVariableSelectionforSupervisedClassicationinHighDimensionVersion1.3.5Date2019-12-10AuthorEmelinePerthame(InstitutPasteur,Paris,France),ChloeFriguet(UniversitedeBretagneSud,Vannes,France)andDavidCauseur(AgrocampusOuest,Rennes,France)MaintainerDavidCauseurÚvi; .ca;useu;r@ag;roca;mpus;-oue;st.f;r000;DescriptionThefunctionsprovidedintheFADA(FactorAdjustedDiscriminantAnalysis)pack-ageaimatperformingsupervisedclassicationofhigh-dimensionalandcorrelatedpro-les.Theprocedurecombinesadecorrelationstepbasedonafactormodelingofthedependenceamongcovariatesandaclassicationmethod.Theavail-ablemethodsareLassoregularizedlogisticmodel(seeFriedmanetal.(2010)),sparselineardiscriminantanalysis(seeClemmensenetal.(2011)),shrinkagelinearanddiagonaldiscriminantanalysis(seeM.Ahdesmakietal.(2010)).Moremethodsofclassica-tioncanbeusedonthedecorrelateddataprovidedbythepackageFADA.LicenseGPLÚvi; .ca;useu;r@ag;roca;mpus;-oue;st.f;r000;(=2)DependsMASS,elasticnetImportssparseLDA,sda,glmnet,mnormt,crossval,corpcor,matrixStats,methodsNeedsCompilationnoRepositoryCRANDate/Publication2019-12-1015:30:05UTCRtopicsdocumented:FADA-package.......................................2data.test...........................................4data.train........
2 ..................................4decor
..................................4decorrelate.test.......................................51 2FADA-packagedecorrelate.train.......................................6FADA............................................8Index11 FADA-packageVariableselectionforsupervisedclassicationinhighdimension DescriptionThefunctionsprovidedintheFADA(FactorAdjustedDiscriminantAnalysis)packageaimatper-formingsupervisedclassicationofhigh-dimensionalandcorrelatedproles.Theprocedurecom-binesadecorrelationstepbasedonafactormodelingofthedependenceamongcovariatesandaclassicationmethod.TheavailablemethodsareLassoregularizedlogisticmodel(seeFriedmanetal.(2010)),sparselineardiscriminantanalysis(seeClemmensenetal.(2011)),shrinkagelinearanddiagonaldiscriminantanalysis(seeM.Ahdesmakietal.(2010)).MoremethodsofclassicationcanbeusedonthedecorrelateddataprovidedbythepackageFADA.DetailsPackage:FADAType:PackageVersion:1.2Date:2014-10-08License:GPL(=2)Thefunctionsavailableinthispackageareusedinthisorder:Step1:Decorrelationofthetrainingdatasetusingafactormodelofthecovariancebythedecorrelate.trainfunction.Thenumberoffactorsofthemodelcanbeestimatedorforced.Step2:Ifneeded,decorrelationofthetestingdatasetbyusingthedecorrelate.testfunc-tionandtheestimatedfactormodelparametersprovidedbydecorrelate.train.Step3:Estimationofasu
3 pervisedclassicationmodelusingthedec
pervisedclassicationmodelusingthedecorrelatedtrainingdatasetbytheFADAfunction.Onecanchooseamongseveralclassicationmethods(moredetailsinthemanualofFADAfunction).Step4:Ifneeded,computationoftheerrorratebytheFADAfunction,eitherusingasupple-mentarytestdatasetorbyK-foldcross-validation.Author(s)EmelinePerthame(AgrocampusOuest,Rennes,France),ChloeFriguet(UniversitedeBretagneSud,Vannes,France)andDavidCauseur(AgrocampusOuest,Rennes,France)Maintainer:DavidCauseur,http://math.agrocampus-ouest.fr/infoglueDeliverLive/membres/david.causeur,mailto:david.causeur@agrocampus-ouest.fr FADA-package3ReferencesAhdesmaki,M.andStrimmer,K.(2010),Featureselectioninomicspredictionproblemsusingcatscoresandfalsenon-discoveryratecontrol.AnnalsofAppliedStatistics,4,503-519.Clemmensen,L.,Hastie,T.andWitten,D.andErsboll,B.(2011),Sparsediscriminantanalysis.Technometrics,53(4),406-413.Friedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.Examples###N
4 otrun###exampleofanentireanalysiswithFAD
otrun###exampleofanentireanalysiswithFADApackageifatestingdatasetisavailable###loadingdata#data(data.train)#data(data.test)#dim(data.train$x)#30250#dim(data.test$x)#1000250###decorrelationofthetrainingdataset#res=decorrelate.train(data.train)#Optimalnumberoffactorsis3###decorrelationofthetestingdatasetafterward#res2=decorrelate.test(res,data.test)###classificationstepwithsda,usinglocalfalsediscoveryrateforvariableselection###lineardiscriminantanalysis#FADA.LDA=FADA(res2,method="sda",sda.method="lfdr")###diagonaldiscriminantanalysis#FADA.DDA=FADA(res2,method="sda",sda.method="lfdr",diagonal=TRUE)###exampleofanentireanalysiswithFADApackageifnotestingdatasetisavailable###loadingdata###decorrelationstep#res=decorrelate.train(data.train)#Optimalnumberoffactorsis3###classificationstepwithsda,usinglocalfalsediscoveryrateforvariableselection###lineardiscriminantanalysis,errorrateiscomputedby10-foldCV(20replicationsoftheCV)#FADA.LDA=FADA(res,method="sda",sda.method="lfdr") 4data.train data.testTestdatasetsimulatedwiththesamedistributionasthetrainingdatasetdata.train. DescriptionThetestdatasethasthesameliststructureasthetrainingdatasetdta.Onlythenumbersofrowsofthexcomponentandlengthoftheycomponentaredifferentsincethetestsamplesizeis1000.Usagedata(data.test)FormatListwith2components:x,the1000x250matrixofsimulatedexplanatoryvariables
5 andy,the1000x1groupingvariable(coded1and
andy,the1000x1groupingvariable(coded1and2).Examplesdata(data.test)dim(data.test$x)#1000250data.test$y#2levels data.trainTrainingdata DescriptionSimulatedtrainingdataset.Thexcomponentisamatrixofexplanatoryvariables,with30rowsand250columns.Eachrowissimulatedaccordingtoamultinormaldistributionwhichmeandependsonagroupmembershipgivenbytheycomponent.Thevariancematrixisthesamewithineachgroup.Usagedata(data.train)FormatAlistwith2components.xisa30x250matrixofsimulatedexplanatoryvariables.yisa30x1groupingvariable(coded1and2). decorrelate.test5Examplesdata(data.train)dim(data.train$x)#30250data.train$y#2levelshist(cor(data.train$x[data.train$y==1,]))#highdependencehist(cor(data.train$x[data.train$y==2,])) decorrelate.testFactorAdjustedDiscriminantAnalysis2:Decorrelationofatestingdatasetafterrunningthedecorrelate.trainfunctiononatrainingdataset DescriptionThisfunctiondecorrelatesthetestdatasetbyadjustingdatafortheeffectsoflatentfactorsofde-pendence,afterrunningthedecorrelate.trainfunctiononatrainingdataset.Usagedecorrelate.test(faobject,data.test)ArgumentsfaobjectAnobjectreturnedbyfunctiondecorrelate.train.data.testAlistcontainingthetestingdataset,withthefollowingcomponent:xisanxpmatrixofexplanatoryvariables,wherenstandsforthetestingsamplesizeandpforthenumberofexplanatoryvariables.ValueReturnsalistwiththefollowingelements:meanclassGr
6 oupmeansestimatedafteriterativedecorrela
oupmeansestimatedafteriterativedecorrelationfa.trainingDecorrelatedtrainingdatafa.testingDecorrelatedtestingdataPsiEstimationofthefactormodelparameters:specicvarianceBEstimationofthefactormodelparameters:loadingsfactors.trainingScoresofthetrainingsindividualsonthefactorsfactors.testingScoresofthetestingindividualsonthefactorsgroupsRecallofgroupvariableoftrainingdataproba.trainingInternalvalue(estimationofindividualprobabilitiesforthetrainingdataset)proba.testingInternalvalue(estimationofindividualprobabilitiesforthetestingdataset)mod.decorrelate.testInternalvalue(classicationmodel) 6decorrelate.trainAuthor(s)EmelinePerthame,ChloeFriguetandDavidCauseurReferencesFriedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.SeeAlsoFADA-packageFADAglmnet-packageExamplesdata(data.train)data(data.test)fa=decorrelate.train(data.train)fa2=decorrelate.test(fa,data.test)names(fa2) decorrelate.trainFactorAdjustedDiscriminantAnalysis1:Decorre
7 lationofthetrain-ingdata DescriptionThis
lationofthetrain-ingdata DescriptionThisfunctiondecorrelatesthetrainingdatasetbyadjustingdatafortheeffectsoflatentfactorsofdependence.Usagedecorrelate.train(data.train,nbf=NULL,maxnbfactors=12,diagnostic.plot=FALSE,min.err=0.001,verbose=TRUE,EM=TRUE,maxiter=15,...)Argumentsdata.trainAlistcontainingthetrainingdatasetwiththefollowingcomponents:xisthenxpmatrixofexplanatoryvariables,wherenstandsforthetrainingsamplesizeandpforthenumberofexplanatoryvariables;yisanumericvectorgivingthegroupofeachindividualnumberedfrom1toK.nbfNumberoffactors.Ifnbf=NULL,thenumberoffactorsisestimated.nbfcanalsobesettoapositiveintegervalue.Ifnbf=0,thedataarenotfactor-adjusted. decorrelate.train7maxnbfactorsThemaximumnumberoffactors.Defaultismaxnbfactors=12.diagnostic.plotIfdiagnostic.plot=TRUE,thevaluesofthevarianceinationcriterionareplottedforeachnumberoffactors.Defaultisdiagnostic.plot=FALSE.Thisoptionmightbehelpfultomanuallydeterminetheoptimalnumberoffactors.min.errThresholdofconvergenceofthealgorithmcriterion.Defaultismin.err=0.001.verbosePrintoutnumberoffactorsandvaluesoftheobjectivecriterionalongtheitera-tions.DefaultisTRUE.EMThemethodusedtoestimatetheparametersofthefactormodel.IfEM=TRUE,pa-rametersareestimatedbyanEMalgorithm.SettingEM=TRUEisrecommendedwhenthenumberofcovariatesexceedsthenumberofobservations.IfEM=FALSE,theparametersareestim
8 atedbymaximum-likelihoodusingfactanal.De
atedbymaximum-likelihoodusingfactanal.DefaultisEM=TRUEmaxiterMaximumnumberofiterationsforestimationofthefactormodel....Otherargumentsthatcanbepassedinthecv.glmnetandglmnetfunctionsfromglmnetpackage.Thesefunctionsareusedtoestimateindividualgroupprobabilities.Modifyingtheseparametersshouldnotaffectthedecorrelationprocedure.However,theargumentnfoldsincv.glmnetissetto10bydefaultandshouldbereduced(minimum3)forlargedatasets,inordertodecreasethecomputationtimeofdecorrelation.train.ValueReturnsalistwiththefollowingelements:meanclassGroupmeansestimatedafteriterativedecorrelationfa.trainingDecorrelatedtrainingdataPsiEstimationofthefactormodelparameters:specicvarianceBEstimationofthefactormodelparameters:loadingsfactors.trainingScoresofthetrainingsindividualsonthefactorsgroupsRecallofgroupvariableoftrainingdataproba.trainingInternalvalue(estimationofindividualprobabilitiesforthetrainingdataset)Author(s)EmelinePerthame,ChloeFriguetandDavidCauseurReferencesFriedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassi&
9 #2;cationissuesforhigh-dimensionalcorrel
#2;cationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing. 8FADASeeAlsoFADA-packageFADAglmnet-packagefactanalExamplesdata(data.train)res0=decorrelate.train(data.train,nbf=3)#whenthenumberoffactorsisforcedres1=decorrelate.train(data.train)#whentheoptimalnumberoffactorsisunknown FADAFactorAdjustedDiscriminantAnalysis3-4:Supervisedclassicationondecorrelateddata DescriptionThisfunctionperformssupervisedclassicationonfactor-adjusteddata.UsageFADA(faobject,K=10,B=20,nbf.cv=NULL,method=c("glmnet","sda","sparseLDA"),sda.method=c("lfdr","HC"),alpha=0.1,...)ArgumentsfaobjectAnobjectreturnedbyfunctiondecorrelate.trainordecorrelate.test.KNumberoffoldstoestimateclassicationerrorrate,onlywhennotestingdataisprovided.DefaultisK=10.BNumberofreplicationsofthecross-validation.DefaultisB=20.nbf.cvNumberoffactorsforcrossvalidationtocomputeerrorrate,onlywhennotest-ingdataisprovided.Bydefault,nbf=NULLandthenumberoffactorsises-timatedforeachfoldofthecrossvalidation.nbfcanalsobesettoapositiveintegervalue.Ifnbf=0,thedataarenotfactor-adjusted.methodThemethodusedtoperformsupervisedclassicationmodel.3optionsareavail-able.Ifmethod="glmnet",aLassopenalizedlogisticregressionisperformedusingglmnetRpackage.Ifmethod="sda",aLDAorDDA(seediagonalargument)isperformedusingShrinkageDiscriminantAnalysisusingsdaRpackage.Ifmethod="spar
10 seLDA",aLassopenalizedLDAisperformedus-i
seLDA",aLassopenalizedLDAisperformedus-ingSparseLDARpackage.sda.methodThemethodusedforvariableselection,onlyifmethod="sda".Ifsda.method="lfdr",variablesareselectedthroughCATscoresandFalseNonDiscoveryRatecon-trol.Ifsda.method="HC",thevariableselectionmethodisHigherCristicismThresholding.alphaTheproportionoftheHCobjectivetobeobserved,onlyifmethod="sda"andsda.method="HC".Defaultis0.1. FADA9...Someargumentstotunetheclassicationmethod.Seethedocumentationofthechosenmethod(glmnet,sdaorsda)formoreinformationsabouttheseparame-ters.ValueReturnsalistwiththefollowingelements:methodRecalloftheclassicationmethodselectedAvectorcontainingindexoftheselectedvariablesproba.trainAmatrixcontainingpredictedgroupfrequenciesoftrainingdata.proba.testAmatrixcontainingpredictedgroupfrequenciesoftestingdata,ifatestingdatasethasbeenprovidedpredict.testAmatrixcontainingpredictedclassesoftestingdata,ifatestingdatasethasbeenprovidedcv.errorAnumericvaluecontainingtheaverageclassicationerror,computedbycrossvalidation,ifnotestingdatasethasbeenprovidedcv.error.seAnumericvaluecontainingthestandarderroroftheclassicationerror,com-putedbycrossvalidation,ifnotestingdatasethasbeenprovidedmodTheclassicationmodelperformed.Theclassofthiselementistheclassofamodelreturnedbythechosenmethod.Seethedocumentationofthechosenmethodformoredetails.Author(s)
11 EmelinePerthame,ChloeFriguetandDavidCaus
EmelinePerthame,ChloeFriguetandDavidCauseurReferencesAhdesmaki,M.andStrimmer,K.(2010),Featureselectioninomicspredictionproblemsusingcatscoresandfalsenon-discoveryratecontrol.AnnalsofAppliedStatistics,4,503-519.Clemmensen,L.,Hastie,T.andWitten,D.andErsboll,B.(2011),Sparsediscriminantanalysis.Technometrics,53(4),406-413.Friedman,J.,Hastie,T.andTibshirani,R.(2010),Regularizationpathsforgeneralizedlinearmod-elsviacoordinatedescent.JournalofStatisticalSoftware,33,1-22.Friguet,C.,Kloareg,M.andCauseur,D.(2009),Afactormodelapproachtomultipletestingunderdependence.JournaloftheAmericanStatisticalAssociation,104:488,1406-1415.Perthame,E.,Friguet,C.andCauseur,D.(2015),Stabilityoffeatureselectioninclassicationissuesforhigh-dimensionalcorrelateddata,StatisticsandComputing.SeeAlsoFADA,decorrelate.train,decorrelate.test,sda,sda-package,glmnet-package 10FADAExamplesdata(data.train)data(data.test)#Whentestingdatasetisprovidedres=decorrelate.train(data.train)res2=decorrelate.test(res,data.test)classif=FADA(res2,method="sda",sda.method="lfdr")###Notrun#Whennotestingdatasetisprovided#ClassificationerrorrateiscomputedbyaK-foldcrossvalidation.#res=decorrelate.train(data.train)#classif=FADA(res,method="sda",sda.method="lfdr") Indexdata.test,4data.train,4decorrelate.test,5,9decorrelate.train,6,9factanal,8FADA,6,8,8,9FADA-package,2glmnet,9sda,