/
CS 598 AGB CS 598 AGB

CS 598 AGB - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
385 views
Uploaded On 2016-12-05

CS 598 AGB - PPT Presentation

Supertrees Tandy Warnow Todays Material Supertree construction given set of trees on subsets of S the full set of taxa construct tree on the full set S of taxa Textbook material Chapter 5 ID: 497795

supertree trees compatibility tree trees supertree tree compatibility set input methods mrp rooted solution testing triplet superfine scm assu

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS 598 AGB" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS 598 AGBSupertrees

Tandy WarnowSlide2

Today’s Material

Supertree construction: given set of trees on subsets of S (the full set of taxa), construct tree on the full set S of taxa.

Textbook material: Chapter 5 (

Aho

, Sagiv, Szymanski, and Ullman) and Chapter 8.1-8.3.Slide3

Computing a tree from a set of rooted triplet trees

Constructing a rooted tree from a set of compatible rooted triplet trees. Equivalently, test compatibility of a set of rooted triplet trees.

Recursive algorithm by

Aho

, Sagiv, Szymanski, and Ullman

Chapter 5.1Slide4

ASSU algorithm

Given set X of k triplet trees on n species:

If n>1, then construct graph with each species one of the vertices, and edges (a,b) for triplets

ab|c

.

If the graph has a single component, reject (the set is not compatible); else

recurse

on each component, and return tree formed by making the rooted trees on the components each a subtree off the root of the returned tree.Slide5

Why does it work?

If the set X of triplet trees is compatible,

T

hen there is a rooted tree T with at least two subtrees off the root, T

1

and T

2

.

Any two leaves a,b in the same subtree cannot be in a triplet

ab|c

.

Hence the graph formed for the set of triplet trees cannot be connected.

Therefore the graph formed for the set of triplet trees must have at least two components.

This argument applies recursively to every subset of X.

Hence the algorithm returns a tree on which all the triplet trees agree.

If the set X of triplet trees is not compatible, it is not hard to show that the algorithm will detect this (proof by induction on the number of taxa). Slide6

Compatibility of rooted trees

Suppose the input is a set X of rooted trees (not necessarily triplet trees).

Can we use ASSU to determine if X is compatible, and to compute a compatibility supertree for X?

Solution: YES, just encode each rooted tree in X by its set of rooted triplet trees (or some subset of these that suffices to define each tree in X), and then run ASSU.Slide7

Summary so far

Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU

Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!

Special cases for testing compatibility of unrooted trees:

Input has a tree on every four taxa.

(Solution: Use All Quartets Method to test for compatibility)

Input trees all have a common species, A.

(Solution: root all the input trees using leaf A, and then run ASSU.)

Input has all the “short quartets” of a tree.

(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide8

Summary so far

Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU

Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!

Special cases for testing compatibility of unrooted trees:

Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)

Input trees all have a common species, A.

(Solution: root all the input trees using leaf A, and then run ASSU.)

Input has all the “short quartets” of a tree.

(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide9

Summary so far

Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU

Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!

Special cases for testing compatibility of unrooted trees:

Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)

Input trees all have a common species, A. (Solution: root all the input trees using leaf A, and then run ASSU.)

Input has all the “short quartets” of a tree.

(Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide10

Summary so far

Testing compatibility of an arbitrary set of rooted trees (and constructing compatibility supertree): polynomial time, using ASSU

Testing compatibility of an arbitrary set of unrooted trees (and constructing compatibility supertree): NP-complete!

Special cases for testing compatibility of unrooted trees:

Input has a tree on every four taxa. (Solution: Use All Quartets Method to test for compatibility)

Input trees all have a common species, A. (Solution: root all the input trees using leaf A, and then run ASSU.)

Input has all the “short quartets” of a tree. (Solution: Use Dyadic Closure to test for compatibility, see Chapter 13) Slide11

Supertree Methods

Most of the time, the input is a set of unrooted source trees that is incompatible.

All the methods described so far only return compatibility supertrees.

How can we construct supertrees from incompatible source trees?Slide12

Supertree estimation

Challenges:

Tree compatibility is NP-complete (therefore, even if subtrees are correct, supertree estimation is hard)

Estimated subtrees have error

Advantages:

Estimating individual gene trees can be computationally feasible (compared to the combined analysis of many genes)

Can use different types of data for each

source treeSlide13

Many Supertree Methods

MRP

weighted MRP

MRF

MRD

Robinson-Foulds Supertrees

Min-Cut

Modified Min-Cut

Semi-strict Supertree

QMC

Q-imputation

SDM

PhySIC

Majority-Rule Supertrees

Maximum Likelihood Supertrees

and many more ...

Matrix Representation with Parsimony

(Most commonly used and most accurate)Slide14

Supertree Optimization P

roblems

MRP (Matrix Representation with Parsimony)

MRL (Matrix Representation with Likelihood)

RFS (Robinson-Foulds Supertree)

MQDS (Minimum Quartet Distance Supertree)

Everything is NP-hard. Some of the methods have good heuristics.

It is easy to see that if the input source trees are compatible, then MRP, RFS, and MQDS return a compatibility tree. Slide15

FN rate of MRP vs.

combined analysis

Scaffold Density (%)Slide16

Comparison of Supertree methods and Concatenation

From Swenson et al., Algorithms for Molecular Biology 2010

http://

almob.biomedcentral.com

/articles/10.1186/1748-7188-5-8Slide17

Comparison of Supertree Methods

Swenson et al., Algorithms for Molecular Biology 2011

http://

almob.biomedcentral.com

/articles/10.1186/1748-7188-6-7Slide18

SuperFine

SuperFine

is a technique for improving the speed and accuracy of supertree methods.

The first step computes a “strict consensus merger” (SCM) of the input trees, and the second step refines the SCM using the supertree method.

The SCM calculation is very fast. The refinement step is applied to each polytomy (node with degree greater than 3) independently, and is fast when the degree is small. Slide19

SuperFine-boosting:

improves accuracy of MRP

Scaffold Density (%)

(Swenson et al., Syst. Biol. 2012)Slide20

SuperFine

First, construct a supertree with low false positives

The Strict Consensus

Then, refine the tree to reduce false negatives by resolving each polytomy using a

base

supertree method (e.g., MRP)

Quartet Max CutSlide21

Theoretical results for SCM

SCM can be computed in polynomial time

For certain types of inputs, the SCM method solves the NP-hard

Tree Compatibility

problem

All splits in the SCM

appear

in at least one source tree (and are not contradicted by any source tree)Slide22

Comparing Supertree Methods on 1000-taxon datasets

Figure 1 from Nguyen, Mirarab, and Warnow, Algorithms for Molecular Biology 2012

http://almob.biomedcentral.com/articles/10.1186/1748-7188-7-3Slide23

Obtaining a supertree with low FP

The Strict Consensus Merger (SCM)

SCM of two trees

Computes the strict consensus on the common leaf set

Then superimposes the two trees, contracting more edges in the presence of

collisions

Slide24

Strict Consensus Merger (SCM)

a

b

c

d

e

f

g

a

b

c

d

h

i

j

e

f

g

h

i

j

a

b

c

d

a

b

c

d

e

f

g

a

b

c

d

h

i

jSlide25

Performance of SCM

Low false positive (FP) rate

(Estimated supertree has few false edges)

High false negative (FN) rate

(Estimated supertree is missing many true edges)Slide26

Part II of SuperFine

Refine the tree to reduce false negatives by resolving each polytomy using a base supertree method (e.g., MRP) Slide27

Part 1 of SuperFine

a

b

c

d

e

f

g

a

b

c

d

h

i

j

e

f

g

h

i

j

a

b

c

d

a

b

c

d

e

f

g

a

b

c

d

h

i

jSlide28

Part 2 of SuperFine

e

f

g

a

b

c

d

h

i

j

a

b

c

e

h

i

j

d

f

g

1

2

3

4

5

6

a

b

c

d

e

f

g

a

b

c

d

h

i

j

1

1

1

4

1

6

5

1

1

1

4

2

3

3

4

1

6

5

1

4

2

3Slide29

Step 2: Apply MRP to the collection of reduced source trees

1

2

3

4

1

4

5

6

MRP

1

2

3

4

6

5Slide30

Replace polytomy using tree from MRP

1

2

3

4

6

5

a

b

c

e

h

i

j

d

f

g

e

f

g

a

b

c

d

h

i

j

h

d

g

f

i

j

a

b

c

eSlide31

Resolving a single polytomy,

v, using MRP

Step 1: Reduce each source tree to a tree on leafset, {1,2,...,

d

} where

d

=degree(

v

)

Step 2: Apply MRP to the collection of reduced source trees, to produce a tree t on {1,2,...,

d

}

Step 3: Replace the star tree at

v

by tree

tSlide32

SuperFine-boosting:

improves accuracy of MRP

Scaffold Density (%)

(Swenson et al., Syst. Biol. 2012)Slide33

SuperFine is also much faster

MRP

8-12 sec.

SuperFine

2-3 sec.

Scaffold Density (%)

Scaffold Density (%)

Scaffold Density (%)Slide34

Summary (so far)

Supertree methods are useful for constructing very large species trees from a set of source trees.

The most well known supertree method is MRP, but there are more accurate methods (e.g., MRL, and perhaps quartet-based methods that try to solve Minimum Quartet Distance Supertree).

SuperFine

is a technique for improving the speed and accuracy of supertree methods.

CA-ML (concatenation using maximum likelihood) is often more accurate than current supertree methods, but is more computationally intensive.Slide35

Limitations of Supertree Methods

Traditional supertree methods assume that the true gene trees match the true species tree.

This is known to be unrealistic in some situations, due to processes such as

Deep coalescence (

incomplete lineage sorting

)

Gene duplication and loss

Horizontal gene transferSlide36

Red gene tree

species tree

(green gene tree okay)Slide37

Coming up

Supertree methods based on quartets are also good for species tree estimation in the presence of ILS and/or HGT!

Supertree methods are useful for divide-and-conquer methods

(e.g., DACTAL).

Related Contents


Next Show more