/
Designing an IT infrastructure for data-intensive collaborative - Designing an IT infrastructure for data-intensive collaborative -

Designing an IT infrastructure for data-intensive collaborative - - PowerPoint Presentation

startse
startse . @startse
Follow
342 views
Uploaded On 2020-11-06

Designing an IT infrastructure for data-intensive collaborative - - PPT Presentation

omics projects Stathis Kanterakis kanteraeebiacuk European Bioinformatics Institute Cambridge UK ICTA 2011 Outline Introduction Why design at all Principles of collaborative design ID: 816303

http org omics www org http www omics data 000 research collaborative design information volume 2009 simbioms source access

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Designing an IT infrastructure for data-..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Designing an IT infrastructure for data-intensive collaborative -omics projects

Stathis Kanterakiskanterae@ebi.ac.ukEuropean Bioinformatics InstituteCambridge, UKICTA 2011

Slide2

OutlineIntroductionWhy design at all?

Principles of collaborative designA software suite for cross-disciplinary collaborative studiesResultsConclusions

Slide3

Introduction

Slide4

The “central dogma” of information flow in molecular biology

DNA RNA Protein

Transcription

(RNA Synthesis)

Translation

(Protein Synthesis)

Replication

(DNA Synthesis)

Source: http

://

www.rsc.org

/

chemistryworld

/Issues/2009/November/

BiologysNobelMoleculeFactory.asp

Slide5

The -

omics cascade

GENOMICS

What

CAN

happen

TRANSCRIPTOMICS

What

APPEARS

to happen

PROTEOME

What

MAKES

it happen

METABOLOME

What

HAS

happened

Source: Systems Biology and the

Omics

Cascade,

Karolinska

Institutet

, June 9-13, 2008

PHENOTYPE

Slide6

http://

xkcd.com/793/

Slide7

407

-omes and -omics terms1

Sources:1

http://omics.org/index.php/Alphabetically_ordered_list_of_omes_and_omics

2

http://www.ensemblgenomes.org

/

3

http://www.genome.gov/sequencingcosts

/

4

http://en.wikipedia.org/wiki/

Interdisciplinarity

330

Genomes sequenced to date

2

3BSize of human genome in bases

$10kCost to sequence a single human3

30k

Interdisciplinary bachelors degrees awarded in 2005 in USA

4

Slide8

Slide9

Challenges in -omics research

Expensive studiesSmall number of replicates (n) (microarrays, subjects...)Large number of variables (

genes, proteins, etc)This results

in:Inflated type I error (false positives)Poor statistical Power (true positives)

Slide10

Why design at all?

http://xkcd.com/970/

Slide11

Volume

vs Complexity cost model

Project

Samples

Research subjects

Studies/data types

Assays

Files/

volume

Users/roles/user groups

Publ

-s per year

Mol

PAGE

16.5

k2.2k

300/1126 000/

1127 000/0.7 TB80/1/1

1

EN

GAGE

>100k

100k

400/13

***

400/

0.25

TB

30/5/13

10

V

C~ data types*user roles*scripts

volume

complexity

Growth of complexity is slower than volume

Both volume and complexity grow fast

Maria

Krestyaninova

, 2009

Slide12

Ome

vs

Omics

Source: http

://

omics.org

/

index.php

/

File:Ome_versus_omics_graph_by_Jong_Bhak_openfree.gif

$3,000,000,000

Cost

$10,000

~$0

2003

2016

Ome

and

Omics

Balance point

2010

$50,000 per person

Slide13

Reporting requirements for publication

DataShaper

, OBO

ISATAB, MAGETAB, MIBBI

Bioconductor

Slide14

Nobody wants a cellphone that makes calls!

Make your application:ContextualizedUsableEnjoyable

Visible (increases reputation)

SociableValuable

Explorable

Flexible

In

a participatory

way

Slide15

OPEN-SOURCE collaborative design

Slide16

Maxims of the post-information era“If the news is important, it will find me”

“Information wants to be free”“Its not information overload, its filter failure”“The people formerly known as the audience”“The sources go direct”and finally…

Source: http://

markcoddington.com/2010/01/30/a-quick-guide-to-the-maxims-of-new-media/

Slide17

“Do what you do best, link the rest”

http://xkcd.com/974/

Slide18

Agile development

Individuals & interactions over processes and toolsWorking software over comprehensive documentationCustomer collaboration over contract negotiation

Responding to change over following a plan

In practice: frequent iterations over customer feedback, trust

Slide19

Metadesign

Participation levelAnalysis

Concept designConcept communication

DistributionEnd-of-life

none

indirect

consultative

Shared control

Full control

Courtesy

of Massimo

Menichinelli

http://www.openp2pdesign.org/

Slide20

Software for cross-disciplinary collaborative studiesSIMBioMS

Slide21

The big picture

CENTRAL DATA ARCHIVES

SIMBIOMS

OBIBA

ISA

QURETEC

METABAR

etc.

dynamic

storage

project

hosting

fast

exchange

permanent

deposition

large

volumes

open

access

support for collaborative discovery

knowledge access and sustainability

large consortia

stand alone researchers

Maria

Krestyaninova

, 2009

Slide22

USERS

DATA PROVIDERS

System overview

Biobanks

-

o

mics

Experiment DB

Sample DB

Public Index

submission

submission

controlled access

open access

Maria

Krestyaninova

, 2009

Slide23

Current infrastructural volume

12 installations in 3 countries100 user-organisations>50.000 samples>50.000 assays and studies 4 large federated R&D projects across Europe and Russia

Krestyaninova

et al,

Bioinformatics

, 2009

Viksna

et al,

BMC Bioinformatics

, 2007

Slide24

SIMBIOMS in collaborative biomedical research initiatives

Project

Goal/Description

Funded by

Simbioms

team involvement

Strategic research collaborations

BBMRI

www.bbmri.eu

Build a network of population-based biobanks, experts, and foster collaboration between them. Provide advice to industry.

EC, OECD

Prototyping of data management model, use-case design, discussions.

P3G

www.p3g.org

Canadian Gov., memberships

Leading international Informatics Working Group; discussions.

ELIXIR

www.elixir-europe.org/page.php

Create a sustainable infrastructure for the storage and distribution of information produced by bioscientists.

EC

Prototyping, reports, cooperation with organisation of medical informatics committee on behalf of EBI.

TaraOceans

oceans.taraexpeditions.org

3-year long circumnavigation expedition for marine genomics and climate integrative study.

CNRS, industry, potentially EC

Preliminary design of data management solution; meetings, discussions.

Services for research collaborations

ENGAGE

www.euengage.org

Genetic and genomic research for clinical application.

EC

Design, development and maintenance of dedicated data exchange services – based on

SIMBioMS

.

MolPAGE

www.molpage.org

Biomarkers: discovery and development of novel high-throughput methods.

EC

MuTHER

Exploration of gene expression in multiple tissues on 1000 twins associated with aging.

Wellcome Trust

SIROCCO

www.sirocco-project.eu

Study of small RNAs as regulatory cell mechanism; therapeutical applications.

EC

CAGEKID

Kidney cancer study.

EC

SUMMIT

Surrogate markers for vascular Micro- &

Macrovascular

hard endpoints for Innovative diabetes Tools

EC

Slide25

Anton Enright, 2011

Slide26

Conclusions

Slide27

Complex interactionsWho has a say in knowledge

extracted from information?Research subjectsConsent to particular research being conductedScientistsProtective of vision about their dataFunding sourcesExpect publications from grantees

Pharma

BioBanks

Research

Institutions

b

ig data

industry

academia

state

FDA

Ministry of Health

Ministry of Education

Yulia

Tammisto

, 2011

Slide28

Complex softwareTIME is the scarcest resource

Software adoption due to:Requirements N

o other way to do things

Usefulness

Use = 1 – Reuse

Slide29

One goalSearch for the

truth

Slide30

Thank you!

Acknowledgements:Maria KrestyaninovaUgis SarkansAnton Enright

Mat DavisYulia Tammisto

Massimo MenichinelliTeemu

Perheentupa

Jani

Heikkinen

Balaji

Rajashekar

Raivo

KoldeJaak Vilo

Uniquer

www.simbioms.org