The Search Tree A Nearestneighbor interchange NNI There are 2 n 3 NNI rearrangements for any tree 2 B Subtree Pruning Regrafting SPR 4 n 3 n 2 SPR rearrangements ID: 556462
Download Presentation The PPT/PDF document "Lecture 8 – Searching Tree Space" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Lecture 8 – Searching Tree SpaceSlide2
The Search TreeSlide3
A. Nearest-neighbor interchange (NNI)
There are 2(
n
– 3) NNI rearrangements for any tree.
2Slide4
B. Subtree
Pruning-
Regrafting (SPR)
4(
n
– 3)(
n
– 2) SPR rearrangements Slide5
C. Tree bisection-reconnection (TBR)
All branches are bisected, and reconnected in all possible ways. It’s not possible to
generalize how many TBR rearrangements could be made for a tree of a given
size (as we could with NNI & SPR), but TBR swapping searches tree space more
thoroughly, than SPR or NNI.Slide6
How greedy should we be?
26 taxon data set and first, let’s be very greedy .
Ignore ties in building starting tree and in swapping.
NNI, examine 42 trees
SPR, examine 2072 trees
TBR, examine 5816 trees
Less greedy - save all equally optimal trees at each step.
NN, examine 140 trees
SPR, examine 6212 trees
TBR, examine 16,604 treesSlide7
Random Addition Sequence and Tree Islands
So in the above example, using the least greedy strategies and using starting trees
generated by 100 random addition sequences, we’ll look at 341,355 different trees.
First Last First Times
Island Size tree tree Score replicate hit
---------------------------------------------------------------------------------
1 2 1 2 278 1 99
2 1 - - 279 97 1Slide8
Transforming Tree Space
May be better off spending less effort searching on one island and more
effort searching for multiple islands
Parsimony Ratchet
(Nixon. 1999. Cladistics. 15: 407 )
Alternate searches using real data and searches on perturbed data set.
Get a starting tree by stepwise addition from the real data
Reweight a random set (20-25%) characters: this transforms tree space.
Hill climb from the starting tree via greedy TBR with perturbed data.
If a better tree is found, use that tree to start TBR using original data.
This is iterated a couple hundred times.Slide9
Simulated Annealing
Designed to search a large, complex, discrete search space.
Laura Salter
Kubatko
was one of the first to apply it to phylogenies as a means of estimating ML trees (Salter and Perl, 2001. Syst. Biol. 50:7).
MCMC approach to search tree space and permits down-hill moves.
Steps:
Generate an initial state (a starting tree). Initially, a random tree was used.
Propose a stochastic change to the initial state (usually a minor change). This was initially derived via a random NNI.If the proposal improves the tree (has a better ML score), the move is accepted.
Proposals (NNIs) that degrade the tree are accepted with a small probability
proportional to how much worse the proposed tree is.
Early on, the acceptance probability is high and decreases as the search runs.Slide10
RAxML
& Alternating Criteria
Stamatakis permits use of a modified simulated annealing in RAxML
.
First, he starts with a tree generated by stepwise addition using parsimony
(randomized addition sequence).
The SA approach be can used to alter topology via (lazy) SPR under ML, but only the branches involved in the swapping are
reoptimized.
Third, RAxML builds proposals to alter branch lengths and model parameters that are only accepted if they improve the likelihood (i.e., this aspect of the searches are entirely hill climbing). These optimizations are cursory and halted with a liberal stopping rule.
This approach allows pretty thorough searches of tree space really quickly, which permits us to estimate ML trees for very large data sets (e.g., thousands of taxa).Slide11
Genetic Algorithms
Paul Lewis (1998. Mol. Biol.
Evol. 15:277)
There are
n
individuals: tree with parameters and branch lengths.
Ranked by their likelihood. Tree with highest fitness leaves
k
offspring in the next generation. Other trees leave offspring proportional to rank.
All offspring are subject to branch length and model mutations.
Some offspring ((
n
-1)/
m
) are subject to random SPR mutations.Slide12
Genetic Algorithms
Recombination searches tree space broadly.
GARLi
was written by Derek
Zwickl
and modifies Lewis’ GA.
Topological mutations include NNI & SPR rearrangements and some local SPR.
Starting trees are generated via stepwise addition with random addition sequences.
This approach allows thorough searches of tree space for up to a couple thousand taxa.