/
The University of New Mexico The University of New Mexico

The University of New Mexico - PDF document

stefany-barnette
stefany-barnette . @stefany-barnette
Follow
370 views
Uploaded On 2015-10-10

The University of New Mexico - PPT Presentation

HowManyBootstrapReplicatesareNecessaryNicholasDPattengale1MasoudAlipour2OlafRPBinindaEmonds3BernardMEMoret24AlexandrosStamatakis51DepartmentofComputerScienceUniversityofNewMexicoAlbuq ID: 156340

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The University of New Mexico" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

The University of New Mexico HowManyBootstrapReplicatesareNecessary?NicholasD.Pattengale(*)1,MasoudAlipour2,OlafR.P.Bininda-Emonds3,BernardM.E.Moret2;4,AlexandrosStamatakis51DepartmentofComputerScience,UniversityofNewMexico,AlbuquerqueNM,USA2LaboratoryforComputationalBiologyandBioinformatics,EPFL,Switzerland3AGSystematikundEvolutionsbiologie,Institutf¨urBiologieundUmweltwissenschaften,UniversityofOldenburg,Germany4SwissInstituteofBioinformatics,Lausanne,Switzerland5TheExelixisLab,DepartmentofComputerScience,TUM¨unchen,GermanyHowManyBootstrapReplicatesareNecessary?–p.1 The University of New Mexico MainResult/ContributionTwocriteriaforstoppingnumbersinphylogeneticbootstrapping Firstempiricalassessmentofvariabilityinsupportvalue,asafunctionofreplicatecount,inbootstrapping ValidateourproposalsforstoppingcriteriaHowManyBootstrapReplicatesareNecessary?–p.2 The University of New Mexico TableofContent Background PhylogenyandSplits ThePhylogeneticBootstrap StoppingNumbers OurTechnique TheFramework Motivation–PermutationTest FrequencyCriterion(FC) WeightedCriterion(WC) TheExperiment(s) ConclusionHowManyBootstrapReplicatesareNecessary?–p.3 The University of New Mexico PhylogeneticReconstruction HumanChimpanzeeOrangutan HowManyBootstrapReplicatesareNecessary?–p.4 The University of New Mexico PhylogeneticReconstruction HumanChimpanzeeOrangutan HowManyBootstrapReplicatesareNecessary?–p.4 The University of New Mexico CanonicalRepresentation-Splits ABCDEF HowManyBootstrapReplicatesareNecessary?–p.5 The University of New Mexico CanonicalRepresentation-Splits ABCDEF AB|CDEF HowManyBootstrapReplicatesareNecessary?–p.5 The University of New Mexico CanonicalRepresentation-Splits ABCDEF AB|CDEF ABE|CDF HowManyBootstrapReplicatesareNecessary?–p.5 The University of New Mexico CanonicalRepresentation-Splits ABCDEF AB|CDEF ABE|CDF DF|ABCE HowManyBootstrapReplicatesareNecessary?–p.5 The University of New Mexico CanonicalRepresentation-Splits ABCDEF AB|CDEF ABE|CDF DF|ABCE HowManyBootstrapReplicatesareNecessary?–p.5 The University of New Mexico ThePhylogeneticBootstrap Soyou'vereconstructedatreeviaMP,orML... HowManyBootstrapReplicatesareNecessary?–p.6 The University of New Mexico ThePhylogeneticBootstrap Soyou'vereconstructedatreeviaMP,orML... andyou'dliketoasseshowwellyourdatasupportsyourtree HowManyBootstrapReplicatesareNecessary?–p.6 The University of New Mexico ThePhylogeneticBootstrap Soyou'vereconstructedatreeviaMP,orML... andyou'dliketoasseshowwellyourdatasupportsyourtree Oneanswer:thephylogeneticbootstrapHowManyBootstrapReplicatesareNecessary?–p.6 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C E D C B A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap1 1 3 1 3 SpeciesA C C C C SpeciesB C G C G SpeciesC - G - G SpeciesD C C C C SpeciesE G C G C E D C B A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap1 1 3 1 3 SpeciesA C C C C SpeciesB C G C G SpeciesC - G - G SpeciesD C C C C SpeciesE G C G C 0 E D C 1 B A B A E D C HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap2 2 1 0 0 SpeciesA T C C C SpeciesB T C A A SpeciesC - - C C SpeciesD C C A A SpeciesE - G A A 1 E D C 2 B A E D C B A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap3 0 0 3 0 SpeciesA C C C C SpeciesB A A G A SpeciesC C C G C SpeciesD A A C A SpeciesE A A C A 2 E D C 2 B A E D B C A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap4 2 1 0 2 SpeciesA T C C T SpeciesB T C A T SpeciesC - - C - SpeciesD C C A C SpeciesE - G A - 3 E D C 3 B A B A C E D HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C Bootstrap5 2 2 1 3 SpeciesA T T C C SpeciesB T T C G SpeciesC - - - G SpeciesD C C C C SpeciesE - - G C 4 E D C 3 B A E D B C A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrapOriginalData 0 1 2 3 SpeciesA C C T C SpeciesB A C T G SpeciesC C - - G SpeciesD A C C C SpeciesE A G - C 0.8 E D C 0.6 B A HowManyBootstrapReplicatesareNecessary?–p.7 The University of New Mexico ThePhylogeneticBootstrap Motivatedbyresamplingtechniquefromstatistics Usedtoassessthestabilityofsimplesummarystatistics Computationallyexpensive–daystomonthsHowManyBootstrapReplicatesareNecessary?–p.8 The University of New Mexico StoppingNumbers Questionweaddress:Howmanyreplicates? Theoryexistsforsimplerestimators Inphylogeny,estimatorisnotonlycomplex,butnumberofbipartitionsgrow Stateoftheartinphylogeny:choosearbitrarily Hedgeschoosesaprioriforagivenlevelofsignicance butignoresfactorswhichgreatlyinuencetheestimator(thetreesearchalgorithm)andhencethestabilityofBSreplicatesHowManyBootstrapReplicatesareNecessary?–p.9 The University of New Mexico OurFramework Majorgoal:notbebiasedbycurrentbesttree Deviseanadaptivecriterion–tobeusedatruntime BasedonaPermutationTest Typicallyusedtorejectthattwosamplesarisefromsamedistribution WeusetoassesswhenapopulationsubsetsufcientlyresemblesfullpopulationHowManyBootstrapReplicatesareNecessary?–p.10 The University of New Mexico OurFramework Bootstop() Withmreplicates Repeatp=100timesrandomlysplitintotwosets(ofsizem 2)scoresimilaritybetweentwosets Assess–If99 100scoresbeatthreshold–DONEElse–incrementm(by,e.g.50) HowManyBootstrapReplicatesareNecessary?–p.11 The University of New Mexico OurFramework Bootstop() Withmreplicates Repeatp=100timesrandomlysplitintotwosets(ofsizem 2)scoresimilaritybetweentwosets Assess–If99 100scoresbeatthreshold–DONEElse–incrementm(by,e.g.50) Welllessthan2npossible HowManyBootstrapReplicatesareNecessary?–p.11 The University of New Mexico OurFramework Bootstop() Withmreplicates Repeatp=100timesrandomlysplitintotwosets(ofsizem 2)scoresimilaritybetweentwosets Assess–If99 100scoresbeatthreshold–DONEElse–incrementm(by,e.g.50) Welllessthan2npossible Ourtwoapproachesdifferintheirdefn.ofsimilarity HowManyBootstrapReplicatesareNecessary?–p.11 The University of New Mexico Scoring(Dis)similarity FrequencyCriterion(FC) Buildvectorsofedgesupportforthetwosubsets TakePearson'sCorrelationCoefcientbetweenthetwovectors HowManyBootstrapReplicatesareNecessary?–p.12 The University of New Mexico Scoring(Dis)similarity FrequencyCriterion(FC) Buildvectorsofedgesupportforthetwosubsets TakePearson'sCorrelationCoefcientbetweenthetwovectors WeightedCriterion(WC) Build(MajorityRules)Consensustreesforthetwosubsets TakeWeightedRFdistancebetweenthetwotrees HowManyBootstrapReplicatesareNecessary?–p.12 The University of New Mexico Scoring(Dis)similarity FrequencyCriterion(FC) Buildvectorsofedgesupportforthetwosubsets TakePearson'sCorrelationCoefcientbetweenthetwovectors WeightedCriterion(WC) Build(MajorityRules)Consensustreesforthetwosubsets TakeWeightedRFdistancebetweenthetwotrees Whatisthedifference? WCtakesintoaccountphylogeneticallymeaningful WCismoreconservative,butalsosensitiveHowManyBootstrapReplicatesareNecessary?–p.12 The University of New Mexico ExperimentalDesign For17diverse,real-worlddatasetswith 125to2,554taxa hudredstotensofthousandsofcolumns wedidthefollowing: HowManyBootstrapReplicatesareNecessary?–p.13 The University of New Mexico ExperimentalDesign For17diverse,real-worlddatasetswith 125to2,554taxa hudredstotensofthousandsofcolumns wedidthefollowing: Generated10;000BSreplicates(servesm!1) HowManyBootstrapReplicatesareNecessary?–p.13 The University of New Mexico ExperimentalDesign For17diverse,real-worlddatasetswith 125to2,554taxa hudredstotensofthousandsofcolumns wedidthefollowing: Generated10;000BSreplicates(servesm!1) Appliedourcriteriatogeneratestoppingnumbers HowManyBootstrapReplicatesareNecessary?–p.13 The University of New Mexico ExperimentalDesign For17diverse,real-worlddatasetswith 125to2,554taxa hudredstotensofthousandsofcolumns wedidthefollowing: Generated10;000BSreplicates(servesm!1) Appliedourcriteriatogeneratestoppingnumbers Assessedqualityofourstoppingnumbersw.r.t.10;000treesetHowManyBootstrapReplicatesareNecessary?–p.13 The University of New Mexico Results Stoppingnumbers FC:150,150,150,200,200,200,200,200,250,250,250,250,250,300,300,300,450 WC:50,200,300,350,400,400,400,400,450,450,500,550,600,600,650,700,1200 Widelyvarying,datasetdependent(especiallywithWC) Correlationofsupportvaluesalwaysexceeds99.5% WRFissmallerthanthespeciedWCthresholdvalueinallcasesHowManyBootstrapReplicatesareNecessary?–p.14 The University of New Mexico Results 0.984 0.986 0.988 0.99 0.992 0.994 0.996 0.998 1 100 1000 10000 Pearson CorrelationNumber of Trees (log scale)404 994 2308 218 HowManyBootstrapReplicatesareNecessary?–p.15 The University of New Mexico Results 0.94 0.95 0.96 0.97 0.98 0.99 1 100 1000 10000 FC criterion valueNumber of Trees (log scale)404 994 2308 218 HowManyBootstrapReplicatesareNecessary?–p.16 The University of New Mexico Results 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 100 1000 10000 Weighted Robinson-FouldsNumber of Trees (log scale)404 994 2308 218 HowManyBootstrapReplicatesareNecessary?–p.17 The University of New Mexico Results 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 100 1000 10000 WC criterion valueNumber of Trees (log scale)404 994 2308 218 HowManyBootstrapReplicatesareNecessary?–p.18 The University of New Mexico Conclusion Firstlarge-scaleempiricalstudyofbootstrappingconvergence Usedbiologicaldatasetsthatcoverawiderangeofinputalignmentsizesandabroadvarietyoforganismsandgenes DevelopedandassessedtwobootstoppingcriteriaHowManyBootstrapReplicatesareNecessary?–p.19 The University of New Mexico Conclusion Twocriteria Canbecomputedatruntime Donotrelyonexternallyprovidedreferencetrees Designedtocapturestoppingpointprovidingsufcientaccuracyforunambigoousbiologicalinterpretationoftheresultingconsensustreesorbest-knownMLtreeswithsupportvaluesHowManyBootstrapReplicatesareNecessary?–p.19 The University of New Mexico Conclusion WCcriterionyieldsbetterperformanceandhigheraccuracythanFC Correlatesverywellwiththemeanerrorofsupportvaluesonthebest-scoringtree. AdvocatetheuseofWCoverFC TakesintoaccounttheBSsupportof“important”bipartitionswhicharesubjecttobiologicalinterpretationHowManyBootstrapReplicatesareNecessary?–p.19 The University of New Mexico Conclusion Highlydatasetdependent Onlycomputeasmanytreesasneeded Bettermethods(andideally,somesupportingtheory)mayexistHowManyBootstrapReplicatesareNecessary?–p.19 The University of New Mexico That'sAllFolks Thanks tomycollaborators totheorganizers forlistening! StoppingCriteriaarepartofRAxML7.1.0alpha http://wwwkramer.in.tum.de/exelixis/software.html Dataforthisstudyisalsoavailable http://lcbb.ep.ch/BS.tar.bz2HowManyBootstrapReplicatesareNecessary?–p.20