/
BioCyc: BioCyc:

BioCyc: "Big Knowledge" of Genomes and Metabolic Pathways - PowerPoint Presentation

clara
clara . @clara
Follow
64 views
Uploaded On 2024-01-03

BioCyc: "Big Knowledge" of Genomes and Metabolic Pathways - PPT Presentation

Peter D Karp ecocycorg SRI International biocycorg metacycorg BioCyc Overview Integrates diverse data and knowledge for thousands of genomes Curated and computationally generated ID: 1037452

biocyc coli data metabolic coli biocyc metabolic data genome pathway ecocyc gene database regulatory derived search functions reactions model

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "BioCyc: "Big Knowledge" of Genomes and M..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. BioCyc: "Big Knowledge" of Genomes and Metabolic PathwaysPeter D. Karp ecocyc.orgSRI International biocyc.org metacyc.org

2. BioCyc OverviewIntegrates diverse data and knowledge for thousands of genomesCurated and computationally generatedServes a diverse set of use casesMetabolic-flux models for individual organisms and organism communitiesExtensive domain-specific visualization toolsSustainability: New subscription model

3.

4. EcoCyc Project – EcoCyc.orgE. coli EncyclopediaOrganism-specific database for the best-studied organism on earth: Escherichia coli (first isolated in 1885)Biologists enter information from E. coli publicationsEcoCyc contents derived from 31,000 publicationsMini-review summaries (2,300 pages) and literature citationsProgrammers author web query and visualization tools“Multi-dimensional annotation of the E. coli genome”Describes the molecular parts-list of E. coli and the functions of those partsGene functions, regulation of gene expressionMetabolic model derived from EcoCyc predicts lethality of gene knock-outs with 95% accuracyNucleic Acids Research 41:D605 2013

5. Model Organism Databases (MODs)Each “complete” genome is incomplete in several respects:40%-60% of genes have no assigned functionRoughly 7% of those assigned functions are incorrectNeed continuous updating of these databases to incorporate new experimental findingsGene positions, sequence, gene functions, regulatory sites, pathwaysMODs are platforms for global analyses of an organismInterpret omics data in a pathway contextIn silico prediction of essential genesCharacterize systems properties of metabolic and genetic networks

6. Pathway/Genome Database OrganizationChromosomesPlasmidsGenesProteinsRNAsReactionsPathwaysCompoundsCELLRegulationOperonsPromotersDNA Binding SitesRegulatory InteractionsSequence Features

7. BioCyc Collection of 7,600Pathway/Genome DatabasesPathway/Genome Database (PGDB) – combines information aboutPathways, reactions, substratesEnzymes, transportersGenes, repliconsTranscription factors/sites, promoters, operonsTier 1: Highly curated PGDBsMetaCyc, HumanCyc, YeastCycEcoCyc -- Escherichia coli K-12AraCyc – Arabidopsis thalianaTier 2: Moderately curated -- 34 PGDBsBacillus subtilis, Mycobacterium tuberculosisTier 3: Computationally-derived DBs, No Curation -- ~7,600 PGDBsCurated information derived from 66,000 publications

8. Creation of BioCyc DatabasesNIH RefSeqPGDBPredict metabolic reactionsPredict transport reactionsPredict pathway hole fillersPredict metabolic pathwaysPredict operonsProtein features [uniprot]Compute Pfam domainsCompute orthologsGO terms [uniprot]Subcellular locations [psortdb]Regulatory data [regtransbase]Database linksOrganism phenotype dataGene essentiality dataPhenotype microarray dataCurationData ImportComputational Inferences

9. What is Curation?Ongoing manual updating and refinement of a PGDBIncorporating information from experimental literatureAuthoring of mini-reviews and citationsUpdating database fieldsGene positions, names, synonymsProtein functions, activators, inhibitorsAddition of new reactions and chemical compounds

10. BioCyc Accelerates ScienceExperimental biologistsComputational biologists Study properties of E. coli metabolic and regulatory networksBioinformaticistsTraining and validation of new bioinformatics algorithms – predict operons, promoters, protein functional linkages, protein-protein interactions, Metabolic engineers“Design of organisms for the production of organic acids, amino acids, ethanol, hydrogen, and solvents “EducatorsMicrobiology and metabolism education

11. BioCyc Databases Serve Multiple Use CasesZoomable Metabolic MapQueryable DatabaseOmics DataAnalysisMetabolic ModelEncyclopedia

12. Perspective 1: EcoCyc as Online EncyclopediaAll genes for which experimental literature exists are curated with a minireview summary3,730 gene products contain summariesSummaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and moreAdditional summaries and other data found in pages for genes, operons, pathwaysTextbook-equivalent pages in summaries: 2,300

13. Perspective 2: EcoCyc as Queryable Database350 database fields capture object properties and relationshipsEach molecular species defined as a DB objectGenes, proteins, small moleculesEach molecular interaction defined as a DB objectMetabolic and transport reactions, regulationExtensive search toolsObject-specific search Search MenuAdvanced search Search -> AdvancedAPIs enable programs to compute across the dataPython, Java, Perl, Lisp, Web services

14. Pathway/GenomeEditorsPathway/GenomeDatabasePathoLogicAnnotatedGenomePathway/GenomeNavigatorBriefings in Bioinformatics 11:40-79 2010+MetaFlux640,000 lines of Lisp code ~= 1.5M lines of C or Java code

15. Pathway Tools Software StackOS: Windows, Mac OS X, LinuxDB: Ocelot, MySQL, SOLRProg Lang: Javascript, Common LispGUI: Ghostscript, SKIPPY, YUISolver: SCIPBioinformatics: Textpresso, MUSCLE, PatMatch, BLAST, libSBML, cytoscape.jsChemoinformatics: Marvin, GlycanBuilder, InChI

16. Genome-scale Visualizations of Cellular NetworksGenerated automatically from PGDBMagnify, interrogatePaint with high-throughput data

17. Gene Expression Data onSingle Pathway

18. Pathway Collage withGene Expression Data

19. E. coli Cellular Overview

20.

21.

22.

23. Gene Expression Data on Cellular Overview

24. Genome Overview

25. Genome Poster

26. Genome Overview

27. Regulatory OverviewShow regulatory relationships among gene groups

28. Regulatory Omics Viewer

29. Metabolic Modeling ApplicationsPredict steady-state reaction fluxes for the metabolic networkPredict growth rates, nutrient uptake ratesRemove genes/reactions from model to predict knock-out phenotypesE. coli: 95.2% accuracy for 1445 genesServe as quality check on EcoCyc dataMetabolic engineering

30. Flux-Balance AnalysisNutrientsBiomassSecretionsAABCXDDMetabolic Reaction ListSteady state, constraint-based quantitative models of metabolismE. coli model derived from EcoCyc (BMC Sys Biol 2014 8:79):16 nutrients108 biomass metabolites2286 reactions

31. Painting E. coli Fluxes on Metabolic Map

32. Dynamic FBA Modeling of E. coliDynamic FBA modeling of E. coli growth under varying nutrient conditionst=1-20: E. coli grows anaerobically on 10 mmol glucoset=21-34: O2 is added to the simulation; E. coli grows completely aerobicallyt=34-35: O2 availability becomes limiting; acetate formst=36-44: O2 is exhausted; anaerobic growth resumest=45 onwards: glucose is exhausted, cells begin to die

33. Dynamic Grid Modeling of a Simple Microbial CommunityInitially, E. rectale is present throughout the grid; E. coli is present in southwest cornerHalfway through simulation, B. thetaiotamicron is added to the middle of the lawnE. rectale shows higher growth where E. coli or B. theta are present because of availability of acetate from E. coli. E. rectale produces butyrate where acetate is present.iotime

34. BioCyc and Pathway Tools AvailabilityBioCyc.org Web site and database files available via subscription starting July 1, 2016

35. AcknowledgementsSRISuzanne Paley, Ron Caspi, Mario Latendresse, Ingrid Keseler, Carol Fulcher, Markus Krummenacker, Richard Billington, Pallavi KaipaEcoCyc CollaboratorsJulio Collado-Vides, Robert Gunsalus, Ian PaulsenMetaCyc CollaboratorsLukas Mueller, Hartmut FoersterSue Rhee, Peifen ZhangFunding sources:NIH National Institute of General Medical Scienceshttp://www.ai.sri.com/pkarp/talks/BioCyc webinars: biocyc.org/webinar.shtml