/
Phylogenetics Phylogenetics

Phylogenetics - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
408 views
Uploaded On 2017-08-22

Phylogenetics - PPT Presentation

Inferring Phylogenies Joseph Felsenstein Excellent reference What is a phylogeny Different Representations Cladogram branching pattern only Phylogram branch lengths are estimated and drawn proportional to the amount of change along the branch ID: 581172

sp1 tree branch phylogeny tree sp1 phylogeny branch probability estimate agcctcttcata sp5 sp4 data sp2 sp3 accgtcttgttasp2 agcgtcatcaaasp3 accgtcttgatasp5 likelihood parsimony agcgtcatcaaasp4

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Phylogenetics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Phylogenetics

“Inferring Phylogenies” Joseph FelsensteinExcellent referenceSlide2

What is a phylogeny?Slide3

Different Representations

Cladogram - branching pattern onlyPhylogram - branch lengths are estimated and drawn proportional to the amount of change along the branch

Rooted - implies directionality of changeUnrooted - does notHow do you root a tree?Slide4

What is a phylogeny used for?Slide5

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide6

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide7

Working Tree

sp1

sp4

sp2

sp3

sp5

c2Slide8

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide9

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4Slide10

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide11

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7Slide12

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide13

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9Slide14

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide15

Working Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9

c10Slide16

Estimate a Phylogeny

Sp1 ACCGTCTTGTTA

Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA

Sp4 ACCGTCTTGATA

Sp5 AGCCTCTTCATASlide17

Final Tree

sp1

sp4

sp2

sp3

sp5

c2

c4

c7

c9

c10

c11Slide18

What optimality criteria do we use then?

ParsimonyLikelihoodBayesian

Distance methods?Slide19

Parsimony

Why should we choose a specific grouping?Maximum parsimony: we should accept the hypothesis that explain the data most simply and efficiently

“Parsimony is simply the most robust criterion for choosing between competing scientific hypotheses. It is not a statement about how evolution may or may not have taken place”

1

1

Kitching, I. J.; Forey, P. L.; Humphries, J. & Williams, D. M. 1998. Cladistics: the theory and practice of parsimony analysis. The systematics Association Publication. No. 11.Slide20

Parsimony

Optimality criteria that chooses the topology with the less number of transformations of character statesOptimizing one component: tree topology (pattern based)

Most parsimonious tree: the one (or multiple) with the minimum number of evolutionary changes (smaller size/tree length)Slide21

Reconstructing trees via sequence data

1

2

3

4

5

6

O

T

G

T

A

A

T

A

A

A

T

G

A

G

B

A

G

C

C

-

G

C

A

A

T

G

A

T

D

A

G

C

C

-

T

A

O

D

C

B

1. T=>A

3. T=>C

2. G=>A

4. A=>G

4. A=>C

5. A=> GAP

6. T=>G

6. T=>G

Tree length = 8Slide22

Neighbor-joining MethodSlide23

NJ distance matricesSlide24

NJ distance matricesSlide25

NJ distance matricesSlide26

NJ distance matricesSlide27

Finished NJ treeSlide28

Pyrimidines

Purines

T

C

A

G

Models of Evolution

Transversions

TransitionsSlide29

Maximum Likelihood

Base frequencies:

fA + fG + fC

+

f

T

= 1Base exchange: f

s + fv = 1R-matrix:  +  +  +  +  +  = 1Gamma shape parameter

Number of discrete gamma-distribution categories

Pinvar:

f

var

+

f

inv

= 1

Likelihood: L = 

l

i

where

i

is each character stateSlide30

Maximum Likelihood

L=Pr(D|H)

w

z

x

y

G

C

G

G

A

t

1

t

2

t

3

t

4

t

5

t

6

t

7

t

8Slide31

ML cont.

the probability that the nucleotide at time

t

is

i

is given by

the probability that the nucleotide at time

t

is

j

,

j

i,

is given bySlide32

Bayes Theorem

Prob (H

│D) =

Prob (H) Prob (D

H)

Prob (D)

H=Hypothesis

D=Data

Prior probability or

Marginal probability of H

The conditional probability of H given D: posterior probability

Likelihood

function

Prior probability or

Marginal probability of D

H

P(H) P(D|H)

Normalizing Constant: ensures ∑ P (H

│D) = 1

Slide33

Take Home Message

Likelihood: represents the P of the data given the hypothesis => difficult to interpret

Bayes approach: estimates the P of the hypothesis given the data => estimates P for the hypothesis of interest Slide34

Bayesian Inference of Phylogeny

Calculating pP of a tree involves a summation over all possible trees and, for each tree, integration over all combinations of bl and substitution-model parameter values

f(

i

|X) =

f(

i

) f(X|

i

)

j

=1

f(

i

)

f(X|

i

)

B(s)

f(

i

,

i

,

|X) =

f(

i

,

i

,

) f(X|

i

,

i

,

)

j

=1

,

f(

i

,

i

,

)

f(X

|

i

,

i

,

)d

d

B(s)

f(

i

|X) =

,

f(

i

,

i

,

) f(X|

i

,

i

,

) d

d

j=1

,

f(

i

,

i

,

) f(X|

i

,

i

,

)d

d

B(s)

Inferences of any single parameter are based on the marginal distribution of the parameter

This marginal

P

distribution of the topology, for example, integrates out all the other parameters

Advantage: the power of the analysis is focused on the parameter of interest (i.e., the topology of the tree)Slide35

Estimating phylogenies

Exhaustive SearchesBranch and bound methodsRise in computational time versus rise in solution spaceSlide36

How many topologies are there?Slide37

The Phylogenetic ProblemSlide38

HIV-1 Whole Genomes

1993 - 15

HIV-1 Whole Genomes

2003 (JAN) - 397Slide39

Tree Space - the final frontierSlide40

Heuristic Searches

Nearest-neighbor interchanges (NNI) - swap two adjacent branches on the tree

Subtree pruning and regrafting (SPR) - removing a branch from the tree (either an interior or an exterior branch) with a subtree attached to it. The subtree is then reinserted into the remaining tree in all possible placesTree bisection and reconnection (TBR) - An interior branch is broken, and the two resulting fragments o the tree ar considered as separate trees. All possible connections are made between a branch of one and a branch of the other.Slide41

Other approaches

Tree-fusing - find two near optimal trees and exchange subgroups between the two trees

Genetic Algorithms - a simulation of evolution with a genotype that describes the tree and a fitness function that reflects the optimality of the treeDisc Covering - upcoming paperSlide42

Phylogenetic Accuracy?

Consistency - A phylogenetic method is

consistent for a given evolutionary model if the method converges on the correct tree as the data available to the method become infinite. Efficiency -

Statistical

efficiency

is a measure of how quickly a method converges on the correct solution as more data are applied to the problem.

Robustness - Robustness refers to the degree to which violations of assumptions will affect performance of phylogenetic methodsSlide43
Slide44

How reliable is MY phylogeny?

Bootstrap AnalysisJackknife Analysis

Posterior Probabilities (Bayesian Approaches)Decay IndicesSlide45

BootstrapSlide46

Pseudoreplicates