/
Data first  vs  Hypothesis first Data first  vs  Hypothesis first

Data first vs Hypothesis first - PowerPoint Presentation

melody
melody . @melody
Follow
342 views
Uploaded On 2022-06-13

Data first vs Hypothesis first - PPT Presentation

Alan Ward Data first vs Hypothesis first Hypothesis driven approach Look at the data we have Formulate an hypothesis about Do experiments to test the hypothesis As a byproduct collect more data ID: 917453

data dna species hypothesis dna data hypothesis species similarity sequence search database tree multilocus melting methods depth results nature

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Data first vs Hypothesis first" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Data first vs Hypothesis first

Alan Ward

Slide2

Data first vs Hypothesis first

Hypothesis driven approach

Look at the data we have

Formulate an hypothesis about ..Do experiments to test the hypothesisAs a byproduct, collect more data

Weinberg R (2010)

Point

: Hypotheses

first. NATURE 464, 678

Slide3

Data first vs Hypothesis first

Data driven approach

Identify a system of interest

Identify an approach to measure/describe attributes of the systemCollect and organise

the data

Golub

T (2010)

Counterpoint

:

Data first. NATURE 464, 679

Slide4

Data first vs Hypothesis first

“Reports

that say that something hasn't

happened are always interesting to me, because as we know,

there

are known

knowns

; there are things we know that we know. There are known unknowns; that is to say, there are things that we now know we don't know. But there are also unknown unknowns – there are things we do not know we don't know

.”

United States Secretary of Defense, Donald Rumsfeld

Slide5

Data first vs Hypothesis first

The

Black Swan: The Impact of the Highly

Improbable. Nassim Taleb

Slide6

Data first

vs

Hypothesis first

known

Hypothesis driven research

unknown

Enzyme activity

Feedback inhibition

Allosteric regulation

Transcriptional regulation -

Inducers and repressors

Non-coding short RNAs

Slide7

Data first

vs

Hypothesis

first

Breadth first

vs

Depth first

A slice up and down

A slice across

Slide8

Observation has always been part of biology as in the

imatinib

example (Golub, 2010)but DNA sequencing technology has revolutionized observational data collection. You can see that Weinberg (2010) is arguing that ‘cheap sequencing’ on a massive scale = too much funding for data collection.

And, he doesn’t argue it but you might spend all your time managing the data1

Data first

vs

Hypothesis first

1

Marx, V (2013) Biology

: The big challenges of big

data.

Nature

498

, 255–260

Slide9

Depth first or breadth first

Two different strategies for computer search algorithms

Which is best?

That heavily depends on the structure of the search tree and the number and location of solutions.
If you know a solution is not far from the root of the tree, a breadth first search (BFS) might be better. If the tree is very deep and solutions are rare, depth first search (DFS) might rootle around forever, but BFS could be faster.

If the tree is very wide, a BFS might need too

much

memory, so it might be completely impractical. If solutions are frequent but located deep in the tree, BFS could be impractical.

If

the search tree is very deep you will need to restrict the search depth for depth first search (DFS),

anyway.

Data first

vs

Hypothesis first

Slide10

Data first vs

Hypothesis

first

EST database

dbEST release 130101

Summary by Organism - 01 January 2013

Number of public entries: 74,186,692

Homo sapiens (human)

8,704,790

Mus

musculus + domesticus (mouse) 4,853,570Zea mays (maize)

2,019,137

Sus

scrofa (

pig

)

1,669,337

Bos

taurus

(

cattle

) 1,559,495Arabidopsis thaliana (thale cress)

1,529,700Danio rerio (zebrafish)

1,488,275Glycine max (soybean)

1,461,722Triticum aestivum (wheat) 1,286,372

Xenopus (Silurana) tropicalis (western clawed frog) 1,271,480Oryza

sativa (rice) 1,253,557Ciona

intestinalis 1,205,674Rattus norvegicus + sp. (

rat

)

1,162,136

Drosophila

melanogaster

(

fruit

fly)

821,005

…..

Salmonella

enterica

subsp

.

enterica

serovar

Typhi

217

Mycobacterium

smegmatis

str. MC2 155 30

Mycobacterium tuberculosis

30

Slide11

DbEST

references

Boguski

, MS, Lowe, TMJ, Tolstoshev, CM (1993) DbEST - Database For Expressed Sequence Tags. Nature Genetics 4

, 332-333Boguski

,

MSS (1994) Gene

discovery in

dbEST

.

Science

265, 1993-4 Boguski, MSS (1995) The turning point in genome research

.

Trends in

Biochemical Sciences

20

, 295

-

6

Nagaraj

,

S (2007) A

hitchhiker's guide to expressed sequence tag (EST) analysis

. Briefings in Bioinformatics 8, 6-21

Data first vs Hypothesis first

Slide12

Why DNA?An example:

Species and strain identification in prokaryotes

DNA:DNA similarity

MLEE (MultiLocus Enzyme Electrophoresis)MLST (MultiLocus Sequence Typing)

ANI (Average Nucleotide Identity)

Data first

vs

Hypothesis first

Slide13

Defining species

The modern concept of species dates back to:

Mayr

, E. (1942) Systematics and the Origin of Species(Columbia Univ. Press, New York

)Biological species concept

: Species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such

groups

de

Queiroz

K (2005) Ernst

Mayr

and the modern concept of species. Proc Natl

Acad

Sci

U S A. 102

Suppl

1: 6600-7.

Slide14

Bacterial species

Bacteria do not interbreed in the same way so defining species in bacteria remained an exercise in clustering organisms with similar, initially phenotypic, characters

Stanier

RY. Adaptation, evolutionary and physiological: Or Darwinism among the microorganisms. In: Davies R, Gale EF, editors. Adaptation in Microorganisms, Third Symposium of the Society for General Microbiology. Cambridge: Cambridge University Press;

1953

Goldner

M (2007) The genius of Roger

Stanier

Can J Infect Dis Med

Microbiol

18, 193–194

Slide15

DNA:DNA similarity

From the 1960s there was a consensus that

all taxonomic information about a bacterium is incorporated in the complete nucleotide sequence of its genome

Wayne et al

., in 1987 correlated the measurement of the similarity of DNA of two strains with then currently defined species and concluded that:

A DNA:DNA similarity of 70% and a

ΔTm

of > 5°C, both are important, marks the boundary of a group of strains which belong to the same species

Wayne

, L. G., Brenner, D. J., Colwell, R. R.,

Grimont

, P. A. D.,

Kandler

, O.,

Krichevsky

, M. I., Moore, L. H., Moore, W. E. C., Murray, R. G. E. & other authors (1987). Report of the ad hoc committee on reconciliation of approaches to bacterial systematics.

Int

J

Syst

Bacteriol

37, 463–464.

Slide16

DNA-DNA similarity

Measuring DNA similarity by

hybridisation

is not the same as DNA sequence similarity and it is measured using a number of different techniques% Similarity

De Ley – rate of renaturationEzaki –

microplate

binding

ΔTm

DNA melting

Elution from hydroxyapatite

The methods are not robust and few labs can do:

Stackebrandt

et al

. (

2002) Report of the Ad Hoc Committee for the re-evaluation of the species definition in bacteriology.

Intl J Systematic

Evol

Microbiol

52, 1043-1047

Slide17

Melting Temperature analysis

Slide18

DNA Melting

Slide19

Using RT-PCR and

Syber

Green for DNA melt curve analysis

Gonzalez, JM &

Saiz

-

Jimenez, C (

2005

) A

simple

fluorimetric

method for the estimation of DNA–DNA

relatedness between

closely related microorganisms by thermal

denaturation temperatures

. Extremophiles

9, 75

–79

Slide20

ΔTm determination

Exactly the same melting program, but this time the DNA from Organism 1 and Organism 2 has been mixed, denatured and then

renatured

at the optimum temperature for

renaturation T

OR

calculated from the %GC (

Tor

=0.51(%GC)+

47.0) before adding

Syber

Green and melting

Slide21

Disadvantages of DNA-DNA similarity

Because DNA:DNA

hybridisation

compares the whole genome it has remained the “Gold standard” for species delineation but it has several disadvantages:It requires large amounts of high quality DNA

The methods are difficult to doDifferent methods can different results

Reciprocal measurements can be very different

(amount of A binding to B is different from amount of B binding to A)

The experimental measurement has to be made between 2 strains – so to obtain DNA-DNA similarity for 5 strains requires 20 experimental determinations and if a 6

th

strain needs to be compared another 5 experiments are needed

Y

ou

can’t build an incremental database

Slide22

Disadvantages of DNA-DNA similarity

Slide23

Multilocus Enzyme Electrophoresis

MLEE

Selander

, RK, Caugant,

DA, Ochman, H, Musser, JM, Gilmour, MN and Whittam, TS

(1986

) Methods

of

multilocus

enzyme electrophoresis for bacterial population genetics and

systematics.

Appl. Environ. Microbiol 51, 873-884

Slide24

Multilocus sequence typing

MLST

Maiden

, MCJ, Bygraves

, JA, Feil, E, Morelli

, G,

Russell

, JE,

Urwin

, R

, Zhang

, Q, Zhou, J, Zurth, K, Caugant, DA,

Feavers

, IM,

Achtman

, M, and Spratt, BG

(1998

)

Multilocus

sequence typing: A portable approach to

the identification

of clones within populations

of pathogenic

microorganisms. Proc. Natl. Acad. Sci. USA 95, 3140–

3145

Staphylococcus

aureus

Slide25

PortableUnambiguous

Reproducible

Cumulative

ScalableMultilocus sequence typing

MLST

Slide26

The traditional method of data reduction

is publication —

results are summarized in peer-reviewed

journals. Publications include only the most important results, from experiments that may have been performed over many years.

The published paper is a concise compilation of the data, an interpretation of

the

results, and a comparison

with results

obtained by

others

.

Data first vs Hypothesis first

A

significant fraction of experiments

from

academic laboratories cannot be repeated

in industry

1

. Reflecting

inadequate description of experiments performed on different equipment and on biological samples that were produced with disparate methods.

1

Begley

CG & Ellis

LM (2012)

Drug development: Raise standards for preclinical cancer research Nature 483, 531

–3

Slide27

Data first vs Hypothesis first

In 1991 the

GenBank

On-line

Service utilized a Solbourne

5/800 running OS/MP

4.0C.

The

database

work was done on a Sun network 4/490 server and workstations running SunOS UNIX version 4.1. The GenBank database was

maintained

on

Sybase

relational

database management system (RDBMS). Software was developed in ' C language

.

In 1990s NCBI scanned the literature for sequences and manually typed them into the database.

Slide28

Data first vs Hypothesis first

Benson, DA, Cavanaugh, M, Clark, K,

Karsch-Mizrachi

, I, Lipman, DJ,

Ostell J and Sayers EW (2013) GenbankNucleic Acids Research

41

, D36–D42