Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn - PDF document

Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn
Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn

Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn - Description


Inferring the gene order of an extinct species has a wide range of applications including the potential to reve al more detailed evolutionary histories to determine gene co ntent and ordering and to understand the consequences of st ructural changes ID: 35585 Download Pdf

Tags

Inferring the gene order

Embed / Share - Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn


Presentation on theme: "Probabilistic Reconstruction of Ancestral Gene Orders with Insertions and Deletions Fei Hu Jun Zhou Lingxi Zhou and Jijun Tang Abstract Changes of gene orderings have been extensively used as a si gn"— Presentation transcript


1ProbabilisticReconstructionofAncestralGeneOrderswithInsertionsandDeletionsFeiHu,JunZhou,LingxiZhouandJijunTangAbstract—Changesofgeneorderingshavebeenextensivelyusedasasignaltoreconstructphylogeniesandancestralgenomes.Inferringthegeneorderofanextinctspecieshasawiderangeofapplications,includingthepotentialtorevealmoredetailedevolutionaryhistories,todeterminegenecontentandordering,andtounderstandtheconsequencesofstructuralchangesfororganismalfunctionandspeciesdivergence.Inthisstudy,weproposeanewadjacency-basedmethod,PMAG,toinferancestralgenomesunderamoregeneralmodelofgeneevolutioninvolvinggeneinsertionsanddeletions(indels),inadditiontogenerearrangements.PMAGimprovesonourpreviousmethodPMAGbydevelopinganewapproachtoinferancestralgenecontentsandreducingtheadjacencyassemblyproblemtoaninstanceofTSP.WedesignedaseriesofexperimentstoextensivelyvalidatePMAGandcomparedtheresultswiththemostrecentandcomparablemethodGapAdj.Accordingtotheresults,ancestralgenecontentspredictedbyPMAGcoincideshighlywiththeactualcontentswitherrorrateslessthan1%.Undervariousdegreesofindels,PMAGconsistentlyachievesmoreaccuratepredictionofancestralgeneordersandatthesametime,producescontigsveryclosetotheactualchromosomes.IndexTerms—AncestralGenome,GeneOrder,GenomeRearrangement,GeneInsertion,GeneDeletion F 1INTRODUCTIONGeneorderdatahasbeenprovedtobeveryusefulinphylogeneticreconstruction,butdeterminingtheancestralordersandorientationsofgenesisstillfarfromsolved.Inrecentyears,reconstructionthehy-potheticalgeneordersofancestorswithorwithoutbeinggiventhespeciationhistoryhavebothbeenstudied.Ifthespeciationhistoryisgiven(intheformofabinarytree),theproblemofndingancestorsatnon-leafnodesisdenedasthesmallphylogenyproblem(SPP);ontheotherhand,startingfromasetofrelatedspecies,thebigphylogenyproblem(BPP)searchesforthephylogenytreealongwithalltheancestorsinthetree.CurrentmethodstosolveSPPareeitherevent-basedoradjacency-based.Event-basedmethodsseekforasetofassignmentsofgeneorderstoeachancestorsuchthatthenumberofevolutionaryeventsisminimized.Thesemethodsareveryexpensive,andmaynotbeabletondasolutionevenaftermonthsofcomputation.Toovercomethisproblem,severaladjacency-basedmethodswerepro-posed,whichcomputethescoreorprobabilityofeachgeneadjacencyandassembleindividualadjacenciesintoavalidpermutationofgeneorderbasedontheirscoresorprobabilities. FeiHuandJijunTangareafliatedwiththeTianjinKeyLaboratoryofCognitiveComputingandApplicationattheTianjinUniversityofChina,andtheDepartmentofComputerScienceandEngineeringattheUniversityofSouthCarolina.E-mail:jtang@cse.sc.eduJunZhouandLingxiZhouarePh.D.StudentsintheDepartmentofComputerScienceandEngineeringattheUniversityofSouthCarolina.Currentlymostmethodsarerestrictedtohandledatasetsinvolvingonlyrearrangements.Undersuchmodel,speciescanonlyhaveequalgenecontentsuchthateachgenehasexactlyonecopyineveryspecies.ThereforeinthisstudyweproposePMAG+asanextensiontoourpreviousmethodPMAGinordertoefcientlyhandledatasetsunderwentalargescaleofrearrangements,aswellasgenedeletionsandinser-tions(indels)ofasingleorasegmentsofgenes.OurexperimentalresultsonsimulateddatasetssuggestthatPMAG+canefcientlyandaccuratelypredictbothancestralgenecontentsandancestralgeneorders.2EVOLUTIONOFGENEORDERSGivenasetofngeneslabeledasf1;2;;ng,agenomecanberepresentedbyanorderingofthesegenes.Eachgeneisassignedwithanorientationthatiseitherpositive,writteni,ornegative,writteni.Twogenesiandjformanadjacency(i;j)ifiisimmediatelyfollowedbyj,or,equivalently,jisimmediatelyfollowedbyi.Ifgenekliesatoneendofalinearchromosome,weletkbeadjacenttoanextremityetomarkthebeginningorendingofthechromosome,writtenas(e;k)or(k;e),andcalledtelomere.Genomerearrangementoperationschangetheor-deringofgenesonchromosomes.Aninversionop-eration(alsocalledreversal)reversesasegmentofachromosome.Atranspositionisanoperationthatswapstwosegmentsofachromosome.Incaseofmul-tiplechromosomes,translocationbreaksachromosomeandreattachesaparttoanotherchromosome,whilefusionjoinstwochromosomesandssionsplitone 2chromosomeintotwo.Yancopoulosetal.[1]proposedauniversaldouble-cut-and-join(DCJ)operationthataccountsforallcommonevents.Thereareanothersetofoperationswhichcanalterthegenecontentinagenome.Adeletion(alsocalledloss)deletesasingleorasegmentofgenesfromthegenome.Itsreverseoperationcalledinsertionintroducesoneorasegmentofgenesthathavenotseenbeforeintoachromosomeatatime.Wholegenomeduplication(WGD)createsanadditionalcopyoftheentiregenomeofaspecies.3METHODSFOROLVINGTHEMALLHYLOGENYROBLEM(SPP)Inthecontextofevent-basedmethods,tondasolu-tionforSPP,itistypicaltoiterateovereachinternalnodetosolveforthemediangenomesuntilthesumofalledgedistances(treescore)isminimized.Themedianproblemcanbeformalizedasfollows:giveasetofmgenomeswithpermutationsfxg1mandadistancemeasurementd,ndanotherpermutationxsuchthatthemedianscoredenedasPm=1d(x;x)isminimized.GRAPPA[2]andMGR[3](aswellastheirrecentlyenhancedversions)aretwowidely-referencedmethodsthatimplementaselectionofmediansolversforphylogenyandancestralgene-orderinference.HoweversolvingeventhesimplestcaseofmedianproblemwhenmequalstothreeisNP-hardformostdistancemeasurements.Progresshasbeenmadeinhandlinggenomeswithunequalgenecontent.TangandMoretproposedatwo-phasemethod[4]inwhichthebestgenecontentfortheme-dianiscomputedandthenabranch-and-boundap-proachisusedtodeterminethebestorderingofthesegenecontents.Zhangetal.laterextendedCaprara'sinversionmediansolver[5]andproposedasimpliedDCJ-baseddistancecomputationforunichromosomalgenomeswithindels.Therstadjacency-basedmethodinprobabilisticframeworkwasintroducedinInferCarsPro[6].ThekeyofthismethodistoestimatetheposteriorprobabilityofobservinganadjacencyintheancestorbasedonanextendedJukes-Cantormodelforbreak-points.Withtheobtainedadjacencyprobabilities,itthenusesagreedyheuristictondavalidgeneorderforeachancestor.LaterHuetal.proposedafasterandmoreaccuratemethodPMAG[7].AlthoughPMAGalsoseekstocomputetheprobabilitiesforadjacen-ciesandusesthesamegreedyheuristictoassemblegeneorders,itavoidstheanalysisofpredecessorandsuccessorrelationships,anddirectlycalculatestheprobabilitiesforonlyasubsetofadjacenciesappearedinleafnodes.Howeverbothmethodsareunabletohandledatasetswithindelsandthegreedyheuristicoftenreturnsanexcessivenumberofcontigs(frag-mentsofchromosomes)whensomeadjacenciesmayhaveequallyhighprobabilitiesbutconicteachother.Inthepastfewyears,severalmethodshadbeenproposedtoaccommodatedatasetswithunequalgenecontent[8],[9],[10].Amongthem,themostrecentmethodGapAdj[10]usesanotherscoringmechanismforgeneadjacenciesandreducestheassemblyprob-lemtoaninstanceofTSP.Tolteroutlessreliableadjacencies,itintroducedacutoffvaluetoremoveadjacencieswithscoresbelowitintheTSPsolution.Furtherbyconsideringpairofgenesseparatedbyuptoagivennumberofgenesasdirectgeneadjacency,contigsareiterativelycombinedintolongerones.ComparedtoInferCars[11],GapAdjproducesamorecorrelatednumberofcontigstotheactualnum-berofchromosomesatthecostofaccuracy.Throughanaturalprocessfortheinferenceofancestralgenecontentsdescribedin[12],GapAdjalsosupportstheanalysisofunequalgenecontents.4ALGORITHMDETAILSGivenaphylogeny,ournewmethodcomputesthegenecontentandorderingofancestral(internal)nodesoneatatime.Priortotheinferenceofatargetancestralnode,wererootthegivenphylogenytreetothenodesuchthatitbecomestherootofthenewtree.Theunderlyingrationaleisthatthecalculationofprobabilitiesfollowsabottom-upmannerandonlythespeciesinthesubtreeofthetargetnodeareconsidered,thereforererootingcanpreventlossofinformation.Asastandardprocedure,rerootinghasalreadyfounduseforancestralgenomereconstruc-tion[6],[7].Afterrerooting,PMAG+proceedsthefollowingthreesteps:1)inferringthegenecontentoftargetnodetodeterminewhichgenesshouldappear;2)computingtheprobabilitiesofgeneadjacencies;3)formingandsolvingaTSPproblemtoplacegenesonchromo-somes.Thefollowingsubsectionsdescribethesestepsindetail.4.1InferenceofAncestralGeneContentsTheveryrststepofancestralreconstructionofteninvolvesexplicitlyestimatinggenecontentinances-tralnodes,usingcontentinformationfromleaves.AnumberofapproacheshavebeendevelopedandmostofthemaresimilarinspirittotheFitch-Hartiganparsimonyalgorithm[4],[12],[13].Forpurerearrangements,everygeneobservedinleafspeciesshouldalsobepresentinallancestors;howeverinthepresenceofgeneindels,suchcorre-spondencedoesnotholdanymoreandagenecanbeeitherpresentorabsentinanancestor.Thereforeourinferenceofancestralcontentsreliesonviewinggenesasindependentcharacters(withbinarystates);wecanthendeterminethestateforeverygeneintheancestor.Therststepinvolvesencodingthegenecontentsofleafspeciesintobinarysequences.Inparticular,supposeadatasetGwithNspeciesisgivenandasetofndistinctgenesS=fg1;g2;:::;gngisidentied 3fromG.ForeachleafspeciesG,itsgenecontentS=fg1;:::;gkgwithkncanbeequivalentlyrepresentedbyasequence=f1;2;:::;nginwhicheachelementhastwostates;ifgj2S,j=1,otherwisej=0forallj(1jn).Forinstance(table1),atotalofvedistinctgenesfa;b;c;d;egcanbeidentiedfromtwotoyspeciesG1andG2withgeneorders(+a;c;+d)and(+b;+a;e)re-spectively.Manymethodsareavailabletoinferancestralstatesfrombinarycharacters,includingRAxML[14]formax-imumlikelihoodandPAUP[15].Inthisstudy,wechoseRAxML(version7.2.8wasusedtoproducetheresultsgiveninthispaper)toconducttheinferenceofstates.Oncetheprobabilitiesofpresencestate,P=fp1;p2;:::;png,fortherootnodearecomputed,thegeneibelongstothegenecontentofrootSrootifp0:5,otherwise,geneiisnotinSroot.Followingthisparadigm,genecontentsforallancestralnodescanbeseparatelyinferredfromleafspecies.Oursimulationshowsthatthisapproachcanestimategenecontentswithlessthan1%errorevenforverydifcultdatasets.4.2InferencetheProbabilitiesofAncestralGeneAdjacenciesIn[7],wehavepresentedanadjacency-basedmethodinprobabilisticframeworkcalledPMAGtocalculatetheprobabilityofobservinganadjacencyinthetargetancestralnode.Themethodproceedsinthefollowingthreemainsteps.Step1Eachspeciesinthedatasetisscreenedtoiden-tifyalluniquegeneadjacenciesandtelomeres.Byviewingeachadjacencyandtelomereasanindependentcharacterwithbinarystates—presenceorabsence,geneordersofspeciescanberigorouslyencodedintoalignedse-quencesofbinarycharacters.Step2Thephylogenytreeisrerootedtothetargetancestralnodeinordertotakeallleafspeciesintoconsideration.Atthesametime,the2nratioforbasecompositionsissetupsuchthattherateofpresencetoabsencetransitionsisroughly2ntimesashighastherateoftransitionsintheotherdirectionunderthesameevolutionarydistance,wherenisequaltothenumberofgenes.Suchmodelhasbeensuccessfullyusedforphylogenyreconstruc-tion[16].Step3Theprobabilitiesofcharactersstatesforallgeneadjacenciesandtelomeresattherootnodearecomputed.Themarginalances-tralreconstructionapproachsuggestedbyYang[17]formoleculardatawasadoptedandextendedtocomputefortPMAG+reusesthethreestepsasdescribedtocalculateprobabilitiesforadjacenciesandtelomeres.Oncetheseprobabilitiesareobtained,itthenusesthefollowingsteptoconnectgeneadjacenciesandtelomeresintocontigs,fromwhichtheancestralgeneorderingcanbeidentied.4.3AssemblingAncestralAdjacenciesintoAn-cestralGeneOrdersThelaststepistoassemblegeneadjacenciesandtelomereintoavalidgeneorder,withrespecttothegenecontentinferredfromtherststep.Ingen-eral,higherprobabilityofpresencestateimpliesanadjacencyortelomereshouldbemorelikelytobeincludedintheancestor;howeverthedecisiononchoosinganadjacencyortelomerecannotbesolelymadeuponitsownprobabilityaseachgenecanonlybeselectedonce.InPMAG,ancestraladjacenciesareassembledbythegreedyheuristicbasedontheadjacencygraphproposedbyMaetal..Thisgreedymethodstartsfromacontigwiththerstgeneandpicksitsneighborbyusingtheadjacencywiththehighestprobability;itthencontinuesaddingnewgenesuntilthereisnomorevalidconnection,inwhichcasethecurrentcontigisclosedandanewonewillbeformed.Therearetwoissueswiththisapproachthatmotivatedustoreplacethegreedyassemblerwithanexactsolver.First,thegreedyheuristiccanachievegoodapproximationonlywhenthedatasetiscloselyrelatedinwhichcasemostverticesinthegraphhaveonlyoneoutgoingedge.Second,thegreedyheuristictendstoreturnanexcessivenumberofcontigsasitfrequentlyleadsitselfintodeadends.Obtaininggeneordersfrom(conict)adjacenciescanbetransformedintoaninstanceofsymmetricTravelingSalesmanProblem(TSP),asshownin[10],[18].Inthiscase,wecantransformgenesintocitiesandadjacencyprobabilitiesintoedgeweightsintheTSPgraph.Inparticular,supposeforthetargetances-tralnodeI,wehaveidentiedasetofmadjacenciesA=fa1;a2;:::;amgandntelomeresT=ft1;t2;:::;tngfromleafspecies.IfthegenecontentofIhasbeeninferredasSI=fg1;g2;:::;gkgandtheprobabilitiesP=fpa1;:::;pam;p1;:::;pngforeachadjacencyandtelomereareknown,wecancreatetheTSPgraphGasfollows:1)Eachgeneg2SIisrepresentedbytwovertices—itsheadandtail,denotedasghandgrespec-tively.Everyextremityinthetelomeret2Tisrepresentedbyauniquevertexe,where1in.Inthisway,thetotalnumberofverticesinthegraphisequalto2m+n.2)Edgesbetweenallpairsofheadandtailofthesamegene(gh;g)areaddedwithinftoguar-anteethisconnectionispresentinthesolution.Edgesarealsoestablishedwithinfforallpairsofextremities(e;ej)wherei=jand1i;jn.3)Foreveryadjacency(f;g)2A,thecorrespond-ingedgeisaddedtoGconnectingfandgh. 4TABLE1:Exampleofbinaryencodingongenecontent. abcde G110110G211001 Similarlyforothercombinationoforientations(f;g),(f;g)and(f;g),wecanadd(fh;gh),(f;g)and(fh;g)respectively.4)Foreverytelomere(e;g)2T,weaddanedgetoGbetweeneandgh.Incaseof(g;e),anedgebetweengandeareadded.5)FortherestoftheedgesinG,wesettheedgeweightstoinftoexcludethemfromthesolution.Astheinferredprobabilitiesrangefrom0to1,usingthemdirectlyasedgeweightsmayintroduceundesirableimpactassociatedwithhandlingsmalloatpoints.ItiscriticalforTSPtohaveamorepreciseandne-grainedsetofedgeweightstoassurethequalityofitssolution.Themoststraightforwardwayistolinearlycorrelatetheedgeweightwithitsproba-bility,howeverinsuchcase,differencesofweightsbetweenadjacenciesaretoostrongandadjacencieswithsmallerprobabilitiescanhardlybeconsidered.Thereforewedecidetousethefollowingequationtocurvetheprobabilitiesintoedgeweights:w(f;g)(m)=log2(10m(1p(f;g)))(1)where(f;g)2fA[Tgandp(f;g)istheprobabilitiesofobserving(f;g).misthesoleparameterdeterminingtheshapeofthecurveandaccordingtoourexperi-ments,TSPyieldsgoodresultswhenm=6.WethenutilizethepowerofoneofthemostusedTSPsolverConcorde[19]tondtheoptimalpathwhichtraverseseveryvertexoncewiththeminimumtotalscore.Inthesolutionpath,multiplecontiguousextremitiesareshranktoasingleoneandagenesegmentbetweentwoextremitiesistakenasacontig.OurconstructionofTSPtopologyisinspiritsimilartoGapAdj,howeverGapAdjrequiresadditionalpro-ceduresandparameterstoadjustthecontignumber.InsteadourinferenceofancestralgenomeisuniformanddirectlyfromthesolutionofTSP,minimizingtheriskofintroducingartifacts.5RESULTS5.1ExperimentalDesignToevaluatetheperformanceofPMAG+,weranaseriesofexperimentsonsimulateddatasetsunderawidevarietyofsettings.Wegeneratedmodeltopologiesfromtheuniformlydistributedbinarytrees,eachwithsspecies.Aninitialgeneorderofndistinctgenesandmchromosomeswasassignedattherootsoitcanevolvedowntotheleavesfollowingthetreetopologymimickingthenaturalprocessofevolution,bycarryingoutasetofpredenedevolutionaryevents.Weuseddifferentevolutionaryratesrwith50%relativeuctuation,thustheactualnumberofeventsperedgeisintheintervalbrn 2;rnc.Sev-eralevolutionaryeventswereconsidered—inversions,translocationsandindelsandeachkindofeventwasassignedaprobabilitytobeselectedduringthesim-ulationprocess.Inthispaper,weonlypresentresultswith20genomes,eachwith1000genesand5chro-mosomes,tocloselymimicbacterialgenomes.Theevolutionaryratesrweresetfrom50to200events,thelaterrepresentinghighlydisturbeddatasets.Foreachcombinationofevolutionaryevents,wesimu-lated10datasetsandreportedaveragesandstandarddeviations.Ourpredictedancestralgenomesareevaluatedbytheratioofcorrectadjacenciesandtelomeresrecov-ered.Inspecic,weusedthefollowingequationtocomputetheerrorrateofreconstruction.E=(1jD\D0j jD[D0j)100%whereDrepresentsthesetofgeneadjacenciesandtelomeresintherealgenomeandD0thepredictedgenomes.Wefurtherreferanelementthatiscon-tainedininferredsetS0butnotintruesetSasafalsepositive(FP)andfalsenegative(FN)isdenedsimilarly,byswappingSandS0.5.2AssessingtheAccuracyofAncestralGeneContentsWerstransimulationstotestPMAG+onthein-ferenceofancestralgenecontents.Ourgeneorders,derivedfromitsdirectancestorthroughanumberofevents,underwentrandomindelsandinversions(twoboundariesofeachinversionareuniformlydis-tributed).Twodifferentprobabilities(5%and10%)ofoccurrencesforindelswereused.WecomparedourinferredgenecontentwithitscorrespondingtruecontentandcountedthenumberofFPsandFNs.Foreachdataset,wesummedthenumberofFPsandFNsinallinternalnodesanddivideditbythetotalnumberofgenesinallancestralnodesthataremissingorinserted.Figure1showsourresults.Fromthisgure,theFPratesarealwaysextremelylow(onlyonedatasetproducedFPs),indicatingthatourinferencecanpreventintroducingerroneousgenecontentandtheinferredcontentsarereliable.FNratesincreaseslightlywhenmoreindeloperationswereperformed,butevenintheworstcasetheerrorratestaysbelow1%.Atthesametime,weranGapAdjwithoutspecifyinganyWGDnodeandsetthecut-offvalueandmaximaliterationsto0:6and25assuggested.Accordingtotheresults,GapAdjfailedto 5 0 0.5 1 1.5 2 5 10 15 20 500 1045 1370 1599 FP and FN rate (%)Evolutionary Rates (%) False Positive 0 0 0 0 False Negative 2.6 3.4 5.5 7.2 (a)5%GeneInsertionandDeletion 0 0.5 1 1.5 2 5 10 15 20 951 1807 2432 3231 FP and FN rate (%)Evolutionary Rates (%) False Positive 0 1.4 0 0 False Negative 3.7 9.6 14.8 21.8 (b)10%GeneInsertionandDeletionFig.1:FPandFNrates(dividedbythenumbersonupperx-axis)withstandarddeviationsundervariousevolutionaryratesandindelrates.Labelsonupperx-axisrepresentthetotalnumberofgenesthatareinsertedordeletedoverallinternalnodesduetoindeloperations.Numbersabovepointsindicatetheactualamountoferrorsinaverage.inferalargeportionofinsertedgenes,makingtheFPsratesinallcaseshigherthan60%.5.3AssessingtheAccuracyofAncestralGeneOrdersWeconductedseveralteststoevaluatetheaccuracyofPMAG+underdifferentdegreesofindels.OurrsttestistocomparePMAG+withcurrentstandardapproachthatreducesthedatasetintoequalcontentbyelimat-inggenesthatarenotpresentineverygenome,whichformsthebaselinemethod(namedPMAG+-Base).OursecondtestistogivePMAG+the“groundtrue”content(namedPMAG+-True)toeliminateallimpactsfromgenecontents.TocomparethegreedyheuristictotheTSPsolution,weswitchedbacktothegreedyheuristicandredidthetests(namedPMAG+-Greedy).FinallytheresultsofGapAdj(whichisthemostrecentmethodtoourknowledge)werereported.Tohaveafaircomparison,wealsocomparedPMAG+withGapAdjusingdatasetswithoutindeloperations.Evaluationofdesignedexperimentsintermsoferrorratesisshowningure2.Fromthegure,the 0 10 20 30 5 10 15 20 Error Rate (%)Evolutionary Rates (%) PMAG+ PMAG+-Greedy GapAdj (a)90%Invand10%Tsl 0 10 20 30 40 50 60 70 80 90 100 5 10 15 20 Error Rate (%)Evolutionary Rates (%) PMAG+ PMAG+-True PMAG+-Base PMAG+-Greedy GapAdj (b)5%InsandDel,80%Invand10%Tsl 0 10 20 30 40 50 60 70 80 90 100 5 10 15 20 Error Rate (%)Evolutionary Rates (%) PMAG+ PMAG+-True PMAG+-Base PMAG+-Greedy GapAdj (c)10%InsandDel,70%Invand10%Tsl 0 200 400 600 800 1000 1200 1400 1600 5 10 15 20 Running Time (s)Evolutionary Rates (%)PMAG+ PMAG+-Greedy GapAdj (d)Runningtimeoftestsin(a)Fig.2:(a),(b)and(c)summarizetheerrorratesundervariousevolutionaryratesandcombinationsofevolu-tionaryevents(Insforinsertion,Delfordeletion,InvforinversionandTslfortranslocation).(d)showstherunningtimeformethodsin(a).Errorbarsindicatethestandarddeviations 6errorratesforbothPMAG+andPMAG+-Truearethelowestinallcasesandthedifferencebetweenthetwoapproachesisalmostindistinguishable,indicatingthaterrorsintroducedbyaverylimitedamountoffalsecontentsarenotsignicant.Asexpected,PMAG+-Baserecoveredtheleastamountofadjacenciesduetothelossofcontents.GapAdj,duetoitsfailureingenecontentinference,achievedmuchhighererrorratesinthepresenceofindels.Eveninthetestofequalgenecontent,PMAG+canstilloutperformGapAdjwitharound5%higheraccuracy.PMAG+-GreedycameveryclosetoPMAG+,how-everinalltest,PMAG+canalwaysreturnmoreaccu-ratereconstructionthanPMAG+-Greedy,suggestingtheusefulnessofourTSPassembler.UsingdifferentdegreesofindelshaslittleimpactontheperformancesofPMAG+.Fromtheperspectiveofadjacencyevolution,aninversionoperationalwaysbreakstwoextantadjacenciesandcreatestwonewad-jacencies,thedisturbancesonadjacenciesintroducedbyanindeloperationareessentiallymuchsimilartoaninversion.Inparticular,adeletionbreakstwoadja-cenciesandcreatesanewone,whileainsertionbreaksoneadjacencyandintroducestwonewadjacencies.Therefore,aslongasancestralgenecontentscanbeaccuratelypredicted,PMAG+returnscomparableresultswithallcombinationsofevolutionaryevents.Thelastguresummariestherunningtimeofallmethods.Fromthegure,PMAG+-GreedybenetsfromthegreedyheuristicisindeedslightlyfasterthanPMAG+,whileGapAdjwhichsolvestheTSPproblemheuristicallytookalongertimetonishthanPMAG+usinganexactsolver.5.4AssessingtheNumberofInferredContigsIn[7],PMAGwastestedwithonlyunichromosomalgenomes,buttheinferredancestralgenomeswereal-wayscomposedofalargenumberofcontigs.GapAdjdesignedaseriesofalgorithmswithtwoargumentstoreconnectcontigsintochromosomeswithrestric-tionoflocalandsmallevolutionaryoperations.OurmethodPMAG+,ontheotherhand,bytreatingtelom-eresasaspecialtypeofadjacencies,simultaneouslyndsthebestsetofadjacenciesandtelomeresinonestep.Astranslocationoperationsaccountforinter-chromosomalrearrangementswhichcanbeequiva-lentlyviewedasassionfollowedbyafusion,thusallancestorsshouldalsohavethesameamountofchromosomestotherootnode,whichis5inourtestcases.ForeachdatasetwithNancestors,thenumberofcontigsc(1iN)ineachancestorwascountedandtheaverageabsolutedifferencesperancestralnodeP=1jc5j Nwascomputedtoas-sesstheaccuracyofchromosomalassembly.Figure3summariesourndings.Aspredicted,theamountofcontigsproducedbyPMAGwastotallyirrelevantto 0 5 10 15 20 25 5 10 15 20 Average Absolute Differences per NodeEvolutionary Rates (%) PMAG+ PMAG+-Greedy GapAdj (a)0%GeneInsertionandDeletion 0 5 10 15 20 25 5 10 15 20 Average Absolute Differences per NodeEvolutionary Rates (%) PMAG+ PMAG+-Greedy GapAdj (b)10%GeneInsertionandDeletionFig.3:Theaverageofabsolutedifferencesperances-tralnodeproducedbyvariousmethods.Errorbarsindicatethestandarddeviationsthetruenumberofchromosomes,whileGapAdjcanindeedreducedalargeportionofredundantcontigs.Incomparison,thenumberofcontigsreturnedbyPMAG+canpreciselyreecttheactualnumberofchromosomesinthetruegenomes.6CONCLUSIONSInthisstudy,weproposedanewadjacency-basedmethodcalledPMAG+toinfertheancestralgeneor-dersunderamoregeneralmodelofgeneevolution,includingintra-chromosomalandinter-chromosomalrearrangementsaswellasgeneinsertionsanddele-tions.Asrealancestorsareunknown,wetestedourmethodthroughaseriesofsimulationstudies.Ac-cordingtotheresults,PMAG+canaccuratelydeducetheancestralgenecontentswitherrorrateslessthan1%.Inthesubsequentinferenceofancestralgeneorders,PMAG+canoutperformallexistingmethods.AlsobyadoptingaTSPsolutionforadjacencyassem-bly,PMAG+notonlyovercametheissueonproducingexcessivecontigs,butalsoachievedbetterperfor-mancethanPMAG.7ACKNOWLEDGMENTFH,JZ,LZandJTaresupportedbygrantsUSNSF#0904179and#1161586. 7REFERENCES[1]S.Yancopoulos,O.AttieandR.Friedberg:Efcientsortingofgenomicpermutationsbytranslocation,inversionandblockinterchangeBioinformatics21(16):3340-3346,2005.[2]B.Moret,S.Wyman,D.Bader,T.Warnow,andM.Yan:Anewimplementationanddetailedstudyofbreakpointanalysis.InProc.6thPacicSymp.Biocomputing(PSB'01),583–594,2001.[3]G.BourqueandP.Pevzner:Genome-scaleevolution:recon-structinggeneordersintheancestralspecies.GenomeResearch,12,26–36,2002.[4]J.Tang,B.Moret,L.Cui,andC.dePamphilis:Phylogeneticreconstructionfromarbitrarygene-orderdata.InProc.4thIEEESymp.onBioinformaticsandBioengineering(BIBE'04),592–599,2004.[5]Y.Zhang,F.HuandJ.Tang:Phylogeneticreconstructionwithgenerearrangementsandgenelosses.2010IEEEInternationalConferenceonBioinformaticsandBiomedicine(BIBM'10),35–38,2010.[6]J.MaAprobabilisticframeworkforinferringancestralgenomicorders2010IEEEInternationalConferenceonBioinformaticsandBiomedicine(BIBM'10),179–184,2010.[7]F.Hu,L.ZhouandJ.Tang:ReconstructingAncestralGenomicOrdersUsingBinaryEncodingandProbabilisticModels9thInternationalSymposiumonBioinformaticsResearchandAp-plications(ISBRA),17–27,2013.[8]J.Ma,A.Ratan,B.Raney,B.Suh,W.MillerandD.Haussler:Theinnitesitesmodelofgenomeevolution.ProceedingsoftheNationalAcademyofSciences105(38):14254–14261,2008.[9]S.Berard,C.Gallien,B.Boussau,G.Szollosi,V.DaubinandE.Tannier:Evolutionofgeneneighborhoodswithinreconciledphylogenies.Bioinformatics28(18):i382-i388,2012.[10]Y.Gagnon,M.BlanchetteandN.El-Mabrouk:Aexibleancestralgenomereconstructionmethodbasedongappedadja-cencies.BMCbioinformatics,13(Suppl19):S4,2012.[11]J.Ma,L.Zhang,B.Suh,B.Raney,R.Burhans,W.Kent,M.Blanchette,D.HausslerandW.Miller:Reconstructingcontiguousregionsofanancestralgenome.GenomeResearch16(12):1557-1565,2006.[12]J.Gordon,K.Byrne,andK.Wolfe:Additions,losses,andrearrangementsontheevolutionaryroutefromareconstructedancestortothemodernSaccharomycescerevisiaegenome.PLoSGenetics5(5):e1000485,2009.[13]V.KuninandC.Ouzounis:GeneTRACE:reconstructionofgenecontentofancestralspecies.Bioinformatics19(11):1412-1416,2003.[14]A.Stamatakis:RAxML-VI-HPC:maximumlikelihood-basedphylogeneticanalyseswiththousandsoftaxaandmixedmodels.Bioinformatics,22(21):2688-2690,2006.[15]D.SwoffordDavid:PAUP*.PhylogeneticAnalysisUsingParsimony(*andOtherMethods).Version4.(2003).[16]Y.Lin,F.Hu,J.TangandB.Moret:MaximumLikelihoodPhy-logeneticReconstructionFromHigh-resolutionWhole-genomeDataAndATreeOf68EukaryotesPacicSymposiumonBiocomputing.PacicSymposiumonBiocomputing(PSB'13)285–296,2013.[17]Z.Yang,K.SudhirKandN.Masatoshi:Anewmethodofinferenceofancestralnucleotideandaminoacidsequences.Genetics1995,141(4):1641-1650.[18]J.TangandL.S.Wang:ImprovingGenomeRearrangementPhylogenyUsingSequence-StyleParsimony.Proc.5thIEEESymp.onBioinformaticsandBioengineering(BIBE'05),137–144,2005.[19]D.Applegate,R.Bixby,V.ChvatalandW.Cook:ConcordeTSPsolver.URL:http://www.math.uwaterloo.ca/tsp/concorde/(2011).FeiHureceivedhisbachelordegreeinbiomedicalengineeringattheHuaZhongUniversityofScienceandTechnology.Hisresearchinterestsismainlyonthephylogeneticreconstructionandinferenceofancestralgenomesusinggene-orderdata.JunZhoucompletedhisbachelordegreinBiotechnologyin2008,atNanJingUniversity,China.Hehadhisrstcontactwithbioin-formaticsin2012,whenhestartedworkingincomputersciencedepartmentonancestralgenomeinformationreferringproject.HeiscurrentlyaPh.D.studentatthecomputersciencedepartment,UniversityofSouthCarolina,studyingthesmallphylogenyproblem.LingxiZhouisaPh.D.candidateincomputerscienceandengi-neering,supervisedbyDr.JijunTangatthebioinformaticslaboftheUniversityofSouthCarolina.Beforethat,hegothisB.S.degreeatthecollegeofcomputerscienceandtechnologyofJilinUniversityinJuly,2011.JijunTangobtainedhisPh.D.fromUniversityofNewMexicoin2004.HeisnowanassociateprofessorinComputerScienceandEngineering,UniversityofSouthCarolina,USA.HeisalsoanadjunctprofessorinSchoolofComputerScienceandTechnology,TianjinUniversity,China.Hismainresearchareaiscomputationalbiology,withfocusonalgorithmdevelopmentinphylogeneticrecon-structionfromgenomerearrangementdata.

Shom More....
celsa-spraggs
By: celsa-spraggs
Views: 254
Type: Public

Download Section

Please download the presentation from below link :


Download Pdf - The PPT/PDF document "Probabilistic Reconstruction of Ancestra..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Try DocSlides online tool for compressing your PDF Files Try Now