Supertrees Tandy Warnow Todays Material Supertree construction given set of trees on subsets of S the full set of taxa construct tree on the full set S of taxa Textbook material Chapter 5 ID: 497795
Download Presentation The PPT/PDF document "CS 598 AGB" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
CS 598 AGBSupertrees
Tandy WarnowSlide2
Today’s Material
Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree on the full set S of taxa.
Textbook material: Chapter 5 (
Aho
, Sagiv, Szymanski, and Ullman) and Chapter 8.1-8.3.Slide3
Computing a tree from a set of rooted triplet trees
Constructing a rooted tree from a set of compatible rooted triplet trees. Equivalently, test compatibility of a set of rooted triplet trees.
Recursive algorithm by
Aho
, Sagiv, Szymanski, and Ullman
Chapter 5.1Slide4
ASSU algorithm
Given set X of k triplet trees on n species:
If n>1, then construct graph with each species one of the vertices, and edges (a,b) for triplets
ab|c
.
If the graph has a single component, reject (the set is not compatible); else
recurse
on each component, and return tree formed by making the rooted trees on the components each a subtree off the root of the returned tree.Slide5
Why does it work?
If the set X of triplet trees is compatible,
T
hen there is a rooted tree T with at least two subtrees off the root, T
1
and T
2
.
Any two leaves a,b in the same subtree cannot be in a triplet
ab|c
.
Hence the graph formed for the set of triplet trees cannot be connected.
Therefore the graph formed for the set of triplet trees must have at least two components.
This argument applies recursively to every subset of X.
Hence the algorithm returns a tree on which all the triplet trees agree.
If the set X of triplet trees is not compatible, it is not hard to show that the algorithm will detect this (proof by induction on the number of taxa). Slide6
Compatibility of rooted trees
Suppose the input is a set X of rooted trees (not necessarily triplet trees).
Can we use ASSU to determine if X is compatible, and to compute a compatibility supertree for X?
Solution: YES, just encode each rooted tree in X by its set of rooted triplet trees (or some subset of these that suffices to define each tree in X), and then run ASSU.Slide7
Summary so far
Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU
Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!
Special cases for testing compatibility of unrooted trees:
Input has a tree on every four taxa.
(Solution: Use All Quartets Method to test for compatibility)
Input trees all have a common species, A.
(Solution: root all the input trees using leaf A, and then run ASSU.)
Input has all the “short quartets” of a tree.
(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide8
Summary so far
Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU
Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!
Special cases for testing compatibility of unrooted trees:
Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)
Input trees all have a common species, A.
(Solution: root all the input trees using leaf A, and then run ASSU.)
Input has all the “short quartets” of a tree.
(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide9
Summary so far
Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU
Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!
Special cases for testing compatibility of unrooted trees:
Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)
Input trees all have a common species, A. (Solution: root all the input trees using leaf A, and then run ASSU.)
Input has all the “short quartets” of a tree.
(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide10
Summary so far
Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU
Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!
Special cases for testing compatibility of unrooted trees:
Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)
Input trees all have a common species, A. (Solution: root all the input trees using leaf A, and then run ASSU.)
Input has all the “short quartets” of a tree. (Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide11
Supertree Methods
Most of the time, the input is a set of unrooted source trees that is incompatible.
All the methods described so far only return compatibility supertrees.
How can we construct supertrees from incompatible source trees?Slide12
Supertree estimation
Challenges:
Tree compatibility is NP-complete (therefore, even if subtrees are correct, supertree estimation is hard)
Estimated subtrees have error
Advantages:
Estimating individual gene trees can be computationally feasible (compared to the combined analysis of many genes)
Can use different types of data for each
source treeSlide13
Many Supertree Methods
MRP
weighted MRP
MRF
MRD
Robinson-Foulds Supertrees
Min-Cut
Modified Min-Cut
Semi-strict Supertree
QMC
Q-imputation
SDM
PhySIC
Majority-Rule Supertrees
Maximum Likelihood Supertrees
and many more ...
Matrix Representation with Parsimony
(Most commonly used and most accurate)Slide14
Supertree Optimization P
roblems
MRP (Matrix Representation with Parsimony)
MRL (Matrix Representation with Likelihood)
RFS (Robinson-Foulds Supertree)
MQDS (Minimum Quartet Distance Supertree)
Everything is NP-hard. Some of the methods have good heuristics.
It is easy to see that if the input source trees are compatible, then MRP, RFS, and MQDS return a compatibility tree. Slide15
FN rate of MRP vs.
combined analysis
Scaffold Density (%)Slide16
Comparison of Supertree methods and Concatenation
From Swenson et al., Algorithms for Molecular Biology 2010
http://
almob.biomedcentral.com
/articles/10.1186/1748-7188-5-8Slide17
Comparison of Supertree Methods
Swenson et al., Algorithms for Molecular Biology 2011
http://
almob.biomedcentral.com
/articles/10.1186/1748-7188-6-7Slide18
SuperFine
SuperFine
is a technique for improving the speed and accuracy of supertree methods.
The first step computes a “strict consensus merger” (SCM) of the input trees, and the second step refines the SCM using the supertree method.
The SCM calculation is very fast. The refinement step is applied to each polytomy (node with degree greater than 3) independently, and is fast when the degree is small. Slide19
SuperFine-boosting:
improves accuracy of MRP
Scaffold Density (%)
(Swenson et al., Syst. Biol. 2012)Slide20
SuperFine
First, construct a supertree with low false positives
The Strict Consensus
Then, refine the tree to reduce false negatives by resolving each polytomy using a
“
base
”
supertree method (e.g., MRP)
Quartet Max CutSlide21
Theoretical results for SCM
SCM can be computed in polynomial time
For certain types of inputs, the SCM method solves the NP-hard
“
Tree Compatibility
”
problem
All splits in the SCM
“
appear
”
in at least one source tree (and are not contradicted by any source tree)Slide22
Comparing Supertree Methods on 1000-taxon datasets
Figure 1 from Nguyen, Mirarab, and Warnow, Algorithms for Molecular Biology 2012
http://almob.biomedcentral.com/articles/10.1186/1748-7188-7-3Slide23
Obtaining a supertree with low FP
The Strict Consensus Merger (SCM)
SCM of two trees
Computes the strict consensus on the common leaf set
Then superimposes the two trees, contracting more edges in the presence of
“
collisions
”
Slide24
Strict Consensus Merger (SCM)
a
b
c
d
e
f
g
a
b
c
d
h
i
j
e
f
g
h
i
j
a
b
c
d
a
b
c
d
e
f
g
a
b
c
d
h
i
jSlide25
Performance of SCM
Low false positive (FP) rate
(Estimated supertree has few false edges)
High false negative (FN) rate
(Estimated supertree is missing many true edges)Slide26
Part II of SuperFine
Refine the tree to reduce false negatives by resolving each polytomy using a base supertree method (e.g., MRP) Slide27
Part 1 of SuperFine
a
b
c
d
e
f
g
a
b
c
d
h
i
j
e
f
g
h
i
j
a
b
c
d
a
b
c
d
e
f
g
a
b
c
d
h
i
jSlide28
Part 2 of SuperFine
e
f
g
a
b
c
d
h
i
j
a
b
c
e
h
i
j
d
f
g
1
2
3
4
5
6
a
b
c
d
e
f
g
a
b
c
d
h
i
j
1
1
1
4
1
6
5
1
1
1
4
2
3
3
4
1
6
5
1
4
2
3Slide29
Step 2: Apply MRP to the collection of reduced source trees
1
2
3
4
1
4
5
6
MRP
1
2
3
4
6
5Slide30
Replace polytomy using tree from MRP
1
2
3
4
6
5
a
b
c
e
h
i
j
d
f
g
e
f
g
a
b
c
d
h
i
j
h
d
g
f
i
j
a
b
c
eSlide31
Resolving a single polytomy,
v, using MRP
Step 1: Reduce each source tree to a tree on leafset, {1,2,...,
d
} where
d
=degree(
v
)
Step 2: Apply MRP to the collection of reduced source trees, to produce a tree t on {1,2,...,
d
}
Step 3: Replace the star tree at
v
by tree
tSlide32
SuperFine-boosting:
improves accuracy of MRP
Scaffold Density (%)
(Swenson et al., Syst. Biol. 2012)Slide33
SuperFine is also much faster
MRP
8-12 sec.
SuperFine
2-3 sec.
Scaffold Density (%)
Scaffold Density (%)
Scaffold Density (%)Slide34
Summary (so far)
Supertree methods are useful for constructing very large species trees from a set of source trees.
The most well known supertree method is MRP, but there are more accurate methods (e.g., MRL, and perhaps quartet-based methods that try to solve Minimum Quartet Distance Supertree).
SuperFine
is a technique for improving the speed and accuracy of supertree methods.
CA-ML (concatenation using maximum likelihood) is often more accurate than current supertree methods, but is more computationally intensive.Slide35
Limitations of Supertree Methods
Traditional supertree methods assume that the true gene trees match the true species tree.
This is known to be unrealistic in some situations, due to processes such as
Deep coalescence (
“
incomplete lineage sorting
”
)
Gene duplication and loss
Horizontal gene transferSlide36
Red gene tree
≠
species tree
(green gene tree okay)Slide37
Coming up
Supertree methods based on quartets are also good for species tree estimation in the presence of ILS and/or HGT!
Supertree methods are useful for divide-and-conquer methods
(e.g., DACTAL).