Inferring Phylogenies Joseph Felsenstein Excellent reference What is a phylogeny Different Representations Cladogram branching pattern only Phylogram branch lengths are estimated and drawn proportional to the amount of change along the branch ID: 581172
Download Presentation The PPT/PDF document "Phylogenetics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Phylogenetics
“Inferring Phylogenies” Joseph FelsensteinExcellent referenceSlide2
What is a phylogeny?Slide3
Different Representations
Cladogram - branching pattern onlyPhylogram - branch lengths are estimated and drawn proportional to the amount of change along the branch
Rooted - implies directionality of changeUnrooted - does notHow do you root a tree?Slide4
What is a phylogeny used for?Slide5
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide6
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide7
Working Tree
sp1
sp4
sp2
sp3
sp5
c2Slide8
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide9
Working Tree
sp1
sp4
sp2
sp3
sp5
c2
c4Slide10
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide11
Working Tree
sp1
sp4
sp2
sp3
sp5
c2
c4
c7Slide12
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide13
Working Tree
sp1
sp4
sp2
sp3
sp5
c2
c4
c7
c9Slide14
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide15
Working Tree
sp1
sp4
sp2
sp3
sp5
c2
c4
c7
c9
c10Slide16
Estimate a Phylogeny
Sp1 ACCGTCTTGTTA
Sp2 AGCGTCATCAAASp3 AGCGTCATCAAA
Sp4 ACCGTCTTGATA
Sp5 AGCCTCTTCATASlide17
Final Tree
sp1
sp4
sp2
sp3
sp5
c2
c4
c7
c9
c10
c11Slide18
What optimality criteria do we use then?
ParsimonyLikelihoodBayesian
Distance methods?Slide19
Parsimony
Why should we choose a specific grouping?Maximum parsimony: we should accept the hypothesis that explain the data most simply and efficiently
“Parsimony is simply the most robust criterion for choosing between competing scientific hypotheses. It is not a statement about how evolution may or may not have taken place”
1
1
Kitching, I. J.; Forey, P. L.; Humphries, J. & Williams, D. M. 1998. Cladistics: the theory and practice of parsimony analysis. The systematics Association Publication. No. 11.Slide20
Parsimony
Optimality criteria that chooses the topology with the less number of transformations of character statesOptimizing one component: tree topology (pattern based)
Most parsimonious tree: the one (or multiple) with the minimum number of evolutionary changes (smaller size/tree length)Slide21
Reconstructing trees via sequence data
1
2
3
4
5
6
O
T
G
T
A
A
T
A
A
A
T
G
A
G
B
A
G
C
C
-
G
C
A
A
T
G
A
T
D
A
G
C
C
-
T
A
O
D
C
B
1. T=>A
3. T=>C
2. G=>A
4. A=>G
4. A=>C
5. A=> GAP
6. T=>G
6. T=>G
Tree length = 8Slide22
Neighbor-joining MethodSlide23
NJ distance matricesSlide24
NJ distance matricesSlide25
NJ distance matricesSlide26
NJ distance matricesSlide27
Finished NJ treeSlide28
Pyrimidines
Purines
T
C
A
G
Models of Evolution
Transversions
TransitionsSlide29
Maximum Likelihood
Base frequencies:
fA + fG + fC
+
f
T
= 1Base exchange: f
s + fv = 1R-matrix: + + + + + = 1Gamma shape parameter
Number of discrete gamma-distribution categories
Pinvar:
f
var
+
f
inv
= 1
Likelihood: L =
l
i
where
i
is each character stateSlide30
Maximum Likelihood
L=Pr(D|H)
w
z
x
y
G
C
G
G
A
t
1
t
2
t
3
t
4
t
5
t
6
t
7
t
8Slide31
ML cont.
the probability that the nucleotide at time
t
is
i
is given by
the probability that the nucleotide at time
t
is
j
,
j
i,
is given bySlide32
Bayes Theorem
Prob (H
│D) =
Prob (H) Prob (D
│
H)
Prob (D)
H=Hypothesis
D=Data
Prior probability or
Marginal probability of H
The conditional probability of H given D: posterior probability
Likelihood
function
Prior probability or
Marginal probability of D
∑
H
P(H) P(D|H)
Normalizing Constant: ensures ∑ P (H
│D) = 1
Slide33
Take Home Message
Likelihood: represents the P of the data given the hypothesis => difficult to interpret
Bayes approach: estimates the P of the hypothesis given the data => estimates P for the hypothesis of interest Slide34
Bayesian Inference of Phylogeny
Calculating pP of a tree involves a summation over all possible trees and, for each tree, integration over all combinations of bl and substitution-model parameter values
f(
i
|X) =
f(
i
) f(X|
i
)
∑
j
=1
f(
i
)
f(X|
i
)
B(s)
f(
i
,
i
,
|X) =
f(
i
,
i
,
) f(X|
i
,
i
,
)
∑
j
=1
∫
,
f(
i
,
i
,
)
f(X
|
i
,
i
,
)d
d
B(s)
f(
i
|X) =
∫
,
f(
i
,
i
,
) f(X|
i
,
i
,
) d
d
∑
j=1
∫
,
f(
i
,
i
,
) f(X|
i
,
i
,
)d
d
B(s)
Inferences of any single parameter are based on the marginal distribution of the parameter
This marginal
P
distribution of the topology, for example, integrates out all the other parameters
Advantage: the power of the analysis is focused on the parameter of interest (i.e., the topology of the tree)Slide35
Estimating phylogenies
Exhaustive SearchesBranch and bound methodsRise in computational time versus rise in solution spaceSlide36
How many topologies are there?Slide37
The Phylogenetic ProblemSlide38
HIV-1 Whole Genomes
1993 - 15
HIV-1 Whole Genomes
2003 (JAN) - 397Slide39
Tree Space - the final frontierSlide40
Heuristic Searches
Nearest-neighbor interchanges (NNI) - swap two adjacent branches on the tree
Subtree pruning and regrafting (SPR) - removing a branch from the tree (either an interior or an exterior branch) with a subtree attached to it. The subtree is then reinserted into the remaining tree in all possible placesTree bisection and reconnection (TBR) - An interior branch is broken, and the two resulting fragments o the tree ar considered as separate trees. All possible connections are made between a branch of one and a branch of the other.Slide41
Other approaches
Tree-fusing - find two near optimal trees and exchange subgroups between the two trees
Genetic Algorithms - a simulation of evolution with a genotype that describes the tree and a fitness function that reflects the optimality of the treeDisc Covering - upcoming paperSlide42
Phylogenetic Accuracy?
Consistency - A phylogenetic method is
consistent for a given evolutionary model if the method converges on the correct tree as the data available to the method become infinite. Efficiency -
Statistical
efficiency
is a measure of how quickly a method converges on the correct solution as more data are applied to the problem.
Robustness - Robustness refers to the degree to which violations of assumptions will affect performance of phylogenetic methodsSlide43Slide44
How reliable is MY phylogeny?
Bootstrap AnalysisJackknife Analysis
Posterior Probabilities (Bayesian Approaches)Decay IndicesSlide45
BootstrapSlide46
Pseudoreplicates