OrganismSpecific Database ModelOrganism Database Why Create a PGDB Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the ID: 933532
Download Presentation The PPT/PDF document "Creating a … Community Database" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Creating a …Community DatabaseOrganism-Specific DatabaseModel-Organism Database
Slide2Why Create a PGDB?Perform pathway analyses as part of a genome projectAnalyze omics data
Create a central
public information
resource for the
organism, update on ongoing basis
Create a metabolic model
Perform comparative analyses
Slide3Model Organism DatabasesDBs that describe the genome and other information about an organismCurated by experts for that organism
No one group can curate all the world’s genomes
Distribute workload across a community of experts to create a community resource
Every sequenced organism with an active experimental community requires a MOD
Integrate genome data with information about the biochemical and genetic network of the organism
Integrate literature-based information with computational predictions
Slide4Rationale for MODsEach “complete” genome is incomplete in several respects:
40%-60% of genes have no assigned function
Roughly 7% of those assigned functions are incorrect
Many assigned functions are non-specific
MODs are platforms for global analyses of an organism
Interpret omics data in a pathway context
In silico
prediction of essential genes
Characterize systems properties of metabolic and genetic networks
Slide5What is Curation?Ongoing updating and refinement of a PGDBCorrect false-positive and false-negative predictionsIncorporate information from experimental literature
Update genome sequence
Update gene functions, gene positions, gene names
Author comments and citations
Add new pathways, modify existing pathways
Enter information about regulatory networks
Slide6Issues in Creating Public MODsScope/prioritize the projectIdentify user community
Obtain buy-in and help from scientific community
Obtain funding
IT
: Set up database server, Web server
Hire and train curators
Slide7Administering Pathway Tools
Slide8New Pathway Tools ReleasesMajor releases = External software releases
Twice per year
Announced on
ptools
-users mailing list
Minor releases twice per year affect only our
BioCyc.org
Web site and
flatfile
distributions
We support one prior release only
Releases announced on
ptools-users@ai.sri.com
Read release notes at
http://brg.ai.sri.com/ptools/release-notes.html
Install process:
Upgrade
schema of your DB (software assisted)
Slide9PGDB Storage:File or Relational DatabaseFile storage:
Advantages:
No RDBMS installation and configuration
Disadvantages:
Must be loaded and saved in its entirety
No transaction history
No concurrent access for multiple users
MySQL
storage:
Advantages:
Faster read access, faster saves
Concurrent update access for multiple users
Stores
transaction history
of all PGDB updates
Disadvantages:
RDBMS
must be installed and configured
Slide10Multiuser Access to PGDBsPGDB stored within one MySQL server
Each curator installs PTools on their
computer
Curator computers query
RDBMS server via internet
For
each frame access, PTools queries
In-memory cache, disk cache, RDBMS server
After
curator saves changes, all changes made by other users are loaded into curator’s session
Slide11How to Release a PGDB?Decide on release frequency and schedule
Don’t wait until it’s perfect to release it!
Quality
assurrance
Run consistency checker
Tools -> Consistency Checker
Also updates organism-summary statistics
Update publications, authors in organism frame
Update via Organism editor
Create new version of PGDB
ptools
-local/
pgdbs
/
yeastcyc
/1.0/kb/
yeastbase.ocelot
Edit against the new version, release the old version
Author release notes
Register PGDB in SRI PGDB registry
Will allow SRI to include it in BioCyc
Slide12Pathway Tools Data Import/ExportFile->ExportFile->Import
Export/import to/from tab-delimited files
Export to Genbank,
GFF3 (
soon),
SBML
,
BioPAX
Export to attribute-value files
Attribute-value files can be imported into BioWarehouse
Relational database system for bioinformatics database integration
Slide13Registry: Public PGDB SharingPGDB registry maintained by SRI at URL
http://biocyc.org/registry.html
Registry operations
List contents of registry
Download PGDBs listed in the registry
Register PGDBs you have created
Slide14Registry DetailsWhy register your PGDB?Facilitate its download by other scientists
Facilitate its inclusion in
BioCyc.org
Why download a PGDB?
Desktop Navigator provides
faster/more
functionality than Web
Comparative operations
Programmatic querying and processing of PGDB
Slide15Changes Planned for BioCyc.orgBioCyc will be starting a subscription model
July 1
Slide16Why?Government funding for databases shrinkingBioCyc funding cut 27% as number of genomes climbed 5X in 5 years
No other foreseeable sources of funding for "Big Knowledge" in life sciences
Goals:
Create high-quality curated EcoCyc-like DBs for many organisms
Couple with extensive user-friendly bioinformatics tools
Slide17How?Subscription access to BioCyc.org by institutions, individuals
Subscription rates will depend on usage levels from previous year
EcoCyc and MetaCyc will remain free
Pathway Tools will remain free