/
PEWOacollectionofwork8owstobenchmarkphylogeneticplacementBenjaminLinar PEWOacollectionofwork8owstobenchmarkphylogeneticplacementBenjaminLinar

PEWOacollectionofwork8owstobenchmarkphylogeneticplacementBenjaminLinar - PDF document

ella
ella . @ella
Follow
344 views
Uploaded On 2021-08-09

PEWOacollectionofwork8owstobenchmarkphylogeneticplacementBenjaminLinar - PPT Presentation

Fig1AOverviewofPEWOinputsandoutputsBAnexampleofplotsdynamicallygeneratedbythePACPruningbasedAccuracyEvaluationprocedureona16SrRNAbacterialreferenceMeasuredMeanexpectedNodeDistanceseNDarereportedlowe ID: 860379

150 etal 2019 2012 etal 150 2012 2019 2018 matsenetal 2010 epa systematicbiology cation 2011 basedaccuracyevaluation linard ows bioinformatics

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "PEWOacollectionofwork8owstobenchmarkphyl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 PEWO:acollectionofworkowstobenchmark
PEWO:acollectionofworkowstobenchmarkphylogeneticplacementBenjaminLinard1,2,,NikolaiRomashchenko1,FabioPardi1andEricRivals1,31LIRMM,UniversityofMontpellier,CNRS,Montpellier,France2SPYGEN,17RueduLacSaint-André,73370LeBourget-du-Lac,France3InstitutFrançaisdeBioinformatique,CNRSUMS3601,Évry,France.Towhomcorrespondenceshouldbeaddressed.AbstractMotivation:Phylogeneticplacement(PP)isaprocessoftaxonomicidenti cationforwhichseveraltoolsarenowavailable.However,itremainsdif culttoassesswhichtoolismoreadaptedtoparticulargenomicdataoraparticularreferencetaxonomy.WedevelopedPEWO,the rstbenchmarkingtooldedicatedtoPPassessment.ItsautomatedworkowscanevaluatePPatmanylevels,fromparameteroptimisationforaparticulartool,totheselectionofthemostappropriategeneticmarkerwhenPP-basedspeciesidenti cationsaretargeted.OurgoalisthatPEWOwillbecomeacommunityeffortandastandardsupportredforfuturedevelopmentsandapplicationsofPP.Availability:https://github.com/ph

2 ylo42/PEWOContact:benjamin.linard@lirmm.
ylo42/PEWOContact:benjamin.linard@lirmm.fr;rivals@lirmm.frSupplementary:Supplementarydataisavailableatpage4.1IntroductionWhenareferencephylogenyisavailable,taxonomicidenti cationofbiologicalsequencescanbeachievedwithphylogeneticplacement(PP).PPprovidesthemostinformativetypeofclassi cationbecauseeachquerysequenceisassignedtoitsputativeorigininthetree.PPcanbeappliedinmanycontexts,includingcommunityecology,speciesdiversity,ormedicalstudies.SeveralPPtoolsweredevelopedforthesepurposes(Matsenetal.,2010;Bergeretal.,2011;Mirarabetal.,2012;Zhengetal.,2018),withfourrecenttoolscapableofprocessinglargersequencevolumes(Barberaetal.,2018;Linardetal.,2019;CzechandStamatakis,2019;Balabanetal.,2020).Inthepreliminaryphaseofexperimentaldesign,assessingwhichtoolsanswertheneedsofagivenapplicationremainsatedioustaskofteninvolvingmanualtests(Manguletal.,2019).Strikingly,PPhasabroadrangeofapplications,butlacksuserguidelinesandbenchmarking.SomeprocedurestoevaluatePPaccuracywereprop

3 osed(Matsenetal.,2010),butneverautomated
osed(Matsenetal.,2010),butneverautomatedviaadedicatedsoftware.Benchmarkingisessentialtodeterminewhichtoolsuitsbetteragivenmetagenomictaskoraspeci cdataset(Sczyrbaetal.,2017).To llthisgap,wedevelopedPEWO(PlacementEvaluationWOrkows),the rsttooldedicatedtoPPbenchmarking.PEWOautomatizesevaluationprocedures(whichwerenotimplementedforthecommunity),andintroducesnovelprocedures.Beyondbenchmarking,PEWOcanhelpdecision-makinginanymetagenomicormetabarcodingprojectforPP-basedtaxonomicidenti cation.Withapplicationsrangingfromparameteroptimizationonparticulargenomicdata,totheselectionofthemostappropriategeneticmarker,PEWOprovidestheusercommunitywithstandardizedworkowsforeasyandreproducibleassessmentofPPanalyses.2OverviewPEWOimplementsevaluationworkowsinPythonandSnakemake(KösterandRahmann,2012),whoseframeworkensuresexibility,platformindependence,andreproducibility.Eachworkowautomaticallyperformsmultiplestepsfromquerygenerationuptosummaryplots/t

4 ables,andcanbetailoredviaSnakemakecon
ables,andcanbetailoredviaSnakemakecon guration les.PEWOanditsdependenciesareeasilyinstalledviaacondavirtualenvironment.Currently,PEWOincorporates vestate-of-the-artPPtools,whichcoveramajorityofPPuses:EPA(RAxML),PPlacer,EPA-ng,RAPPASandAPPLES.Fourarealignment-basedtools,whileRAPPASisalignment-free.Asinput,eachworkowtakesaphylogenetictreeandthereferencemultiplesequencealignmentfromwhichitwasbuilt(Figure1).Optionally,theusercanprovideasetofquerysequences.Belowwedescribetheworkowsandsomeoftheirapplications.2.1PEWOproceduresPruning-basedaccuracyevaluation(PAC):inthisstandardprocedureforassessingplacementaccuracy(Matsenetal.,2010;Bergeretal.,2011),asubsetofsequencesisrandomlyprunedfromthereferencephylogenyandalignment.Eachprunedsequencethenservestogeneratequeriesforplacement,andtheaccuracyofeachtoolismeasuredinnumberofnodesseparatingpredictedfromtrueplacement.PEWOofferstwoversionsofthistopologicalmetric:NodeDistance(ND)andexpectedNodeDistance(eND

5 ).TheeNDaccountsforplacementuncertainty(
).TheeNDaccountsforplacementuncertainty(e.g.likelihoodweightratios).Allselectedtoolsarecomparedforauser-selectedcombinationofparameters.Likelihood-basedaccuracyevaluation(LAC)isanew,fasterevaluationprocedureintroducedinPEWOtoassessrelativeaccuracyofPP.Ititeratesthefollowingprocessforasetofqueries:placethequery,extendthephylogenytoincludethatquery,optimizethebranchlengthsofthisextendedtree,andreturnitslog-likelihood(LL).TheusercanthencomparetheLLvaluesobtainedwithdifferenttools,ordifferentsettingsofasametool(e.g.byinspectingthedistributionof2 Fig.1.A.OverviewofPEWOinputsandoutputs.B.Anexampleofplotsdynamically-generatedbythePAC(Pruning-basedAccuracyEvaluation)procedureona16SrRNAbacterialreference.MeasuredMeanexpectedNodeDistances(eND)arereported(lowervalue=betteraccuracy).PanelsreportselectedconditionsforPPlacerandRAPPAS,e.g.differentparametervaluestestedindifferentrowsandcolumns.ForPPlacer,varyingparametersarems(max-strikes,Xaxis)andsb(strike-box,Yaxis).Parame

6 termp(max-pitches,greybox)is xed.For
termp(max-pitches,greybox)is xed.ForRAPPAS,varyingparametersarek(phylo-kmersize)ando(omegathreshold).Parametersred(alignmentreduction)andar(softwareusedforancestralreconstruction)are xed.C.FourPACprocedureswererunfordifferentColeopteranmitogenomeloci(rows)andcompiled.AverageexpectedNodeDistance(eND)ismeasuredforthreetools(columns)usingdefaultparameters.Foreachlocus,thelowestaverageeNDishighlightedinbold.ForRAPPAS,thelastcolumnshowsthataccuracycanbeimprovedwhenincreasingk-mersize(defaultisk=8).ExamplesB.andC.aremoreextensivelydiscussedinSupplementaryMaterials.thedifferencesbetweenLLvaluesobtainedwithtwodifferenttools).SeetheSupplementaryMaterialsforamoredetaileddescription.Resourceevaluation(RES):outputstheruntimeandmemoryusageofselectedtools,withdetailsforeachplacementstep(e.g.,pro lealignment,databaseconstruction,placement...).Onecancomparetheimpactontimeandmemoryfortool-speci cparametercombinations,whilesearchingforanappropriateaccuracy/resou

7 rcetrade-off,orevaluatethetools'scalabil
rcetrade-off,orevaluatethetools'scalabilitywithrespecttoinputsize.2.2ApplicationsPEWOprocedurescovernumeroususecasesarisingwithPP,asillustratedbysixexemplarapplicationsprovidedonGitHub(twoarereportedinFigure1B-C).AsnewPPtoolscanbeincorporatedinPEWO,PEWOproceduresenablecomparingexistingandfuturetoolsonresourceusage,scalability,oraccuracyinareproducibleway.WithPEWO,userscanoptimizetheirPPpipelinedesign.Forinstance,foragivenreference(treeandalignment),determinewhichtoolandparametercombinationwillmaximizeplacementaccuracy,andatwhichcomputationalcost.PEWOfacilitatessuchtests,asinFigure1-B,whichshowstwoplotsautomaticallygeneratedbythePACprocedurerunningPPlacerandRAPPASfor9and6parametercombinations,respectively.Asasecondexample,weshowhowPEWOcanbeusedtocomparedifferentgeneticmarkersavailableforthesametaxa,asthechoiceofthemarkermayimpacttheaccuracyofplacement.Forexample,weevaluatedtheplacementsforfourloci(16S,12S,cox1,cyt)ontheirassociatedphylogenyfor900Coleopteranmitochond

8 rialgenomes(Linardetal.,2018).Figure1-Cd
rialgenomes(Linardetal.,2018).Figure1-Cdisplaystheresults(reproducibleviaGitHubexample4)highlightingthat:i)12Syieldsthemostaccurateplacements,despitebeingthesecondshortestlocus,ii)thetoolachievingthebestaccuracydependsonthemarker,andiii)withRAPPAS,alongerk-mersizeisrequiredtoobtainaccuracysimilarorbetterthanalignment-basedmethods.2.3AvailabilityandimplementationPEWO,withfulldocumentationandexampleworkows,isfreelyavailablefromitsrepositoryURL:https://github.com/phylo42/PEWO.Itsmodular,well-documented,andevolvablesourcecodeenablesthecommunitytoeasilyextenditbyaddingnewtools,procedures,ormetrics.Notably,userscandeveloptheirownevaluationproceduresstartingfromPEWOSnakemakerulesastemplatesfortheirownworkows.AnyPPtoolcanbeintegratedaslongasitoutputsresultsinjplaceformat(ajsonspeci cation,standardinPP,see(Matsenetal.,2012)),canbeparameterizedviathecommandline,andisavailableonacondaorpiprepository(seethedocumentationforguidelines).3ConclusionReproducibilityofcom

9 putationalanalysesinlifesciencesisacruci
putationalanalysesinlifesciencesisacrucialissue,evenmorewhenlargescaledatacomesintoplay,asinthecaseofmetagenomics.WithPEWO,weprovidearesourcethatfacilitatestheevaluationandcomparisonofPPtoolsunderauni edframework.Italliesexibility,extensibility,witheaseofuse,whileitinheritsastandardizedinstallationprocedurefromthecondaframework.ThesetofworkowsinPEWOaimstogrowasacommunityeffort,andextensionsarewelcome.InPEWO,weintroducealikelihood-basedaccuracyevaluationprocedure,whichiscomplementarytoexistingprocedures(Matsenetal.,2010).PEWOwillhelpthecommunityinitseffortstodevelopfuturePPtoolsandwillfacilitateexperimentaldecisionswhenPPischosenasameanstospeciesidenti cation.Withthehelpoffuturecontributors,wehopethatPEWOwillevolveasastandardforPPbenchmarking,andanswerforthcomingunforeseenyetauspiciousapplications.AcknowledgementsWethankVincentLefortfortechnicalassistance,theATGCbioinformaticplatform,theInstitutFrançaisdeBioinformatique[ANR-11-INBS-0013].FundingThis

10 workhasbeensupportedbyFranceGénomique[A
workhasbeensupportedbyFranceGénomique[ANR-10-INBS-0009],MNERTfellowshiptoNR.ReferencesBalaban,M.etal.(2020).Apples:scalabledistance-basedphylogeneticplacementwithorwithoutalignments.SystematicBiology,69(3),566–578.Barbera,P.etal.(2018).EPA-ng:MassivelyParallelEvolutionaryPlacementofGeneticSequences.SystematicBiology,68(2),365–369. Berger,S.A.etal.(2011).Performance,accuracy,andwebserverforevolutionaryplacementofshortsequencereadsundermaximumlikelihood.SystematicBiology,60(3),291–302.Czech,L.andStamatakis,A.(2019).Scalablemethodsforanalyzingandvisualizingphylogeneticplacementofmetagenomicsamples.PLOSONE,14(5),e0217050.Köster,J.andRahmann,S.(2012).Snakemake—ascalablebioinformaticsworkowengine.Bioinformatics,28(19),2520–2522.Linard,B.etal.(2018).Thecontributionofmitochondrialmetagenomicstolarge-scaledataminingandphylogeneticanalysisofcoleoptera.MolecularPhylogeneticsandEvolution,128,1–11.Linard,B.etal.(2019).Rapidalignment-freephylo

11 geneticidenti cationofmetagenomicseq
geneticidenti cationofmetagenomicsequences.Bioinformatics,35(18),3303–3312.Mangul,S.etal.(2019).Systematicbenchmarkingofomicscomputationaltools.NatureCommunications,10(1).Matsen,F.A.etal.(2010).pplacer:Lineartimemaximum-likelihoodandbayesianphylogeneticplacementofsequencesontoa xedreferencetree.BMCBioinformatics,11(1),538.Matsen,F.A.etal.(2012).Aformatforphylogeneticplacements.PLoSONE,7(2),e31009.Mirarab,S.etal.(2012).SEPP:sate-enabledphylogeneticplacement.InR.B.Altman,A.K.Dunker,L.Hunter,T.Murray,andT.E.Klein,editors,Biocomputing2012:ProceedingsofthePaci cSymposium,KohalaCoast,Hawaii,USA,January3-7,2012,pages247–258.WorldScienti cPublishing.Sczyrba,A.etal.(2017).Criticalassessmentofmetagenomeinterpretation-abenchmarkofmetagenomicssoftware.NatureMethods,14(11),1063–1071.Zheng,Q.etal.(2018).HmmUFOtu:AnHmmandPhylogeneticPlacementBasedUltra-FastTaxonomicAssignmentandOtuPickingToolforMicrobiomeAmpliconSequencingStudies.GenomeBiology,19(1)

Related Contents


Next Show more