omics projects Stathis Kanterakis kanteraeebiacuk European Bioinformatics Institute Cambridge UK ICTA 2011 Outline Introduction Why design at all Principles of collaborative design ID: 816303
Download The PPT/PDF document "Designing an IT infrastructure for data-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Designing an IT infrastructure for data-intensive collaborative -omics projects
Stathis Kanterakiskanterae@ebi.ac.ukEuropean Bioinformatics InstituteCambridge, UKICTA 2011
Slide2OutlineIntroductionWhy design at all?
Principles of collaborative designA software suite for cross-disciplinary collaborative studiesResultsConclusions
Slide3Introduction
Slide4The “central dogma” of information flow in molecular biology
DNA RNA Protein
Transcription
(RNA Synthesis)
Translation
(Protein Synthesis)
Replication
(DNA Synthesis)
Source: http
://
www.rsc.org
/
chemistryworld
/Issues/2009/November/
BiologysNobelMoleculeFactory.asp
Slide5The -
omics cascade
GENOMICS
What
CAN
happen
TRANSCRIPTOMICS
What
APPEARS
to happen
PROTEOME
What
MAKES
it happen
METABOLOME
What
HAS
happened
Source: Systems Biology and the
Omics
Cascade,
Karolinska
Institutet
, June 9-13, 2008
PHENOTYPE
Slide6http://
xkcd.com/793/
Slide7407
-omes and -omics terms1
Sources:1
http://omics.org/index.php/Alphabetically_ordered_list_of_omes_and_omics
2
http://www.ensemblgenomes.org
/
3
http://www.genome.gov/sequencingcosts
/
4
http://en.wikipedia.org/wiki/
Interdisciplinarity
330
Genomes sequenced to date
2
3BSize of human genome in bases
$10kCost to sequence a single human3
30k
Interdisciplinary bachelors degrees awarded in 2005 in USA
4
Slide8Slide9Challenges in -omics research
Expensive studiesSmall number of replicates (n) (microarrays, subjects...)Large number of variables (
genes, proteins, etc)This results
in:Inflated type I error (false positives)Poor statistical Power (true positives)
Slide10Why design at all?
http://xkcd.com/970/
Slide11Volume
vs Complexity cost model
Project
Samples
Research subjects
Studies/data types
Assays
Files/
volume
Users/roles/user groups
Publ
-s per year
Mol
PAGE
16.5
k2.2k
300/1126 000/
1127 000/0.7 TB80/1/1
1
EN
GAGE
>100k
100k
400/13
***
400/
0.25
TB
30/5/13
10
V
C~ data types*user roles*scripts
volume
complexity
Growth of complexity is slower than volume
Both volume and complexity grow fast
Maria
Krestyaninova
, 2009
Slide12Ome
vs
Omics
Source: http
://
omics.org
/
index.php
/
File:Ome_versus_omics_graph_by_Jong_Bhak_openfree.gif
$3,000,000,000
Cost
$10,000
~$0
2003
2016
Ome
and
Omics
Balance point
2010
$50,000 per person
Slide13Reporting requirements for publication
DataShaper
, OBO
ISATAB, MAGETAB, MIBBI
Bioconductor
Slide14Nobody wants a cellphone that makes calls!
Make your application:ContextualizedUsableEnjoyable
Visible (increases reputation)
SociableValuable
Explorable
Flexible
In
a participatory
way
…
Slide15OPEN-SOURCE collaborative design
Slide16Maxims of the post-information era“If the news is important, it will find me”
“Information wants to be free”“Its not information overload, its filter failure”“The people formerly known as the audience”“The sources go direct”and finally…
Source: http://
markcoddington.com/2010/01/30/a-quick-guide-to-the-maxims-of-new-media/
Slide17“Do what you do best, link the rest”
http://xkcd.com/974/
Slide18Agile development
Individuals & interactions over processes and toolsWorking software over comprehensive documentationCustomer collaboration over contract negotiation
Responding to change over following a plan
In practice: frequent iterations over customer feedback, trust
Slide19Metadesign
Participation levelAnalysis
Concept designConcept communication
DistributionEnd-of-life
none
indirect
consultative
Shared control
Full control
Courtesy
of Massimo
Menichinelli
http://www.openp2pdesign.org/
Slide20Software for cross-disciplinary collaborative studiesSIMBioMS
Slide21The big picture
CENTRAL DATA ARCHIVES
SIMBIOMS
OBIBA
ISA
QURETEC
METABAR
etc.
dynamic
storage
project
hosting
fast
exchange
permanent
deposition
large
volumes
open
access
support for collaborative discovery
knowledge access and sustainability
large consortia
stand alone researchers
Maria
Krestyaninova
, 2009
Slide22USERS
DATA PROVIDERS
System overview
Biobanks
-
o
mics
Experiment DB
Sample DB
Public Index
submission
submission
controlled access
open access
Maria
Krestyaninova
, 2009
Slide23Current infrastructural volume
12 installations in 3 countries100 user-organisations>50.000 samples>50.000 assays and studies 4 large federated R&D projects across Europe and Russia
Krestyaninova
et al,
Bioinformatics
, 2009
Viksna
et al,
BMC Bioinformatics
, 2007
Slide24SIMBIOMS in collaborative biomedical research initiatives
Project
Goal/Description
Funded by
Simbioms
team involvement
Strategic research collaborations
BBMRI
www.bbmri.eu
Build a network of population-based biobanks, experts, and foster collaboration between them. Provide advice to industry.
EC, OECD
Prototyping of data management model, use-case design, discussions.
P3G
www.p3g.org
Canadian Gov., memberships
Leading international Informatics Working Group; discussions.
ELIXIR
www.elixir-europe.org/page.php
Create a sustainable infrastructure for the storage and distribution of information produced by bioscientists.
EC
Prototyping, reports, cooperation with organisation of medical informatics committee on behalf of EBI.
TaraOceans
oceans.taraexpeditions.org
3-year long circumnavigation expedition for marine genomics and climate integrative study.
CNRS, industry, potentially EC
Preliminary design of data management solution; meetings, discussions.
Services for research collaborations
ENGAGE
www.euengage.org
Genetic and genomic research for clinical application.
EC
Design, development and maintenance of dedicated data exchange services – based on
SIMBioMS
.
MolPAGE
www.molpage.org
Biomarkers: discovery and development of novel high-throughput methods.
EC
MuTHER
Exploration of gene expression in multiple tissues on 1000 twins associated with aging.
Wellcome Trust
SIROCCO
www.sirocco-project.eu
Study of small RNAs as regulatory cell mechanism; therapeutical applications.
EC
CAGEKID
Kidney cancer study.
EC
SUMMIT
Surrogate markers for vascular Micro- &
Macrovascular
hard endpoints for Innovative diabetes Tools
EC
Slide25Anton Enright, 2011
Slide26Conclusions
Slide27Complex interactionsWho has a say in knowledge
extracted from information?Research subjectsConsent to particular research being conductedScientistsProtective of vision about their dataFunding sourcesExpect publications from grantees
Pharma
BioBanks
Research
Institutions
b
ig data
industry
academia
state
FDA
Ministry of Health
Ministry of Education
Yulia
Tammisto
, 2011
Slide28Complex softwareTIME is the scarcest resource
Software adoption due to:Requirements N
o other way to do things
Usefulness
Use = 1 – Reuse
Slide29One goalSearch for the
truth
Slide30Thank you!
Acknowledgements:Maria KrestyaninovaUgis SarkansAnton Enright
Mat DavisYulia Tammisto
Massimo MenichinelliTeemu
Perheentupa
Jani
Heikkinen
Balaji
Rajashekar
Raivo
KoldeJaak Vilo
Uniquer
www.simbioms.org