/
Practical on phylogenetic trees based on Practical on phylogenetic trees based on

Practical on phylogenetic trees based on - PowerPoint Presentation

lois-ondreau
lois-ondreau . @lois-ondreau
Follow
388 views
Uploaded On 2016-10-13

Practical on phylogenetic trees based on - PPT Presentation

sequence alignments Kyrylo Bessonov November 26th 2013 Talk plan How to build phylogenetic trees of types Unrooted Rooted Context comparison of viral proteins of dengue virus Examples on phylogenetic tree building ID: 475044

virus tree ape seqs tree virus seqs ape alignment dengue phylogenetic seqnames protein query rooted mydist proteins mytree ns1 library related q9yp96

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Practical on phylogenetic trees based on" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Practical on phylogenetic trees based on sequence alignments

Kyrylo

Bessonov

November 26th, 2013Slide2

Talk plan

How to build phylogenetic trees of types

Unrooted

Rooted

Context

comparison of viral proteins of dengue virus

Examples on phylogenetic tree building

Dengue virusSlide3

Building a phylo tree using ape

Ape -

A

nalyses of

P

hylogenetics

and

E

volution

Functions to create and manipulate

phylo

trees

Graphical exploration of phylogenetic data

To build a phylogenetic tree

Download protein sequences from DB

Align sequences

Calculate pairwise distance using

ape

Visualize a phylogenetic treeSlide4

Building an unrooted

phylogenetic tree (1)

#install req. libraries

install.packages

("

seqinr

")

install.packages

("muscle")

install.packages

("ape")

library("

seqinr

")

library("muscle")

library("ape

")

multipleSeqAlignment

<-

function

(

seqnames

,

seqs

){

#

umax

is an object of class

fasta

from muscle package

fasta_seqs_Object

=

umax

;

tmp

=

data.frame

(V1=rep(0,length(

seqs

)),V2=rep(0,length(

seqs

)))

for(

i

in 1:length(

seqs

)){

tmp

[i,1]=

seqnames

[

i

]

tmp

[i,2]=paste(

seqs

[[

i

]],collapse="")

}

fasta_seqs_Object$seqs

=

tmp

#multiple sequence alignment

#remove conflicting

ape library from

the memory

try(detach("

package:ape

"), silent=T)

alignment=muscle(

seqs

=

fasta_seqs_Object

, out = NULL)

alignment_ape

=ape::

as.alignment

(matrix(

alignment$seqs

[,2]))

alignment_ape$nam

=

alignment$seqs

[,1]

return (

alignment_ape

)

}Slide5

Building an unrooted phylogenetic tree

(2)

#main part of the code

choosebank

("

swissprot

") #selects database for query

seqnames

<- c("P06747", "P0C569", "O56773", "Q5VKP1")

seqs

=list()

for(

i

in 1:length(

seqnames

)){

query <- query(paste("AC=",

seqnames

[

i

],

sep

=""))

seqs

[

i

]=

getSequence

(query)

}

#

multipleSeqAlignment

() is defined on previous slide

alignment_ape

<-

multipleSeqAlignment

(

seqnames

,

seqs

);

mydist

<-

dist.alignment

(

alignment_ape

)

#

nj

() performs

the neighbor-joining tree estimation by Saitou and

Nei

mytree

<-

nj

(

mydist

)

mytree$tip.label

=c("Q5VKP1-\

nWestern

Caucasian bat virus\nphosphoprotein

","

P06747-\

nrabies

virus\nphosphoprotein

","P0C569-\

nMokola

virus\nphosphoprotein","O56773-\

nLagos

bat virus\

nphosphoprotein

")

plot.phylo

(

mytree,type

="u",

edge.color

= "blue",

edge.width

= 3,

cex

=0.8,

no.margin

=T,

srt

=50)Slide6

Unrooted Phylogenetic Tree

Phylogenetic tree showing distance between 4 protein viral sequences

the genetic distance between

O56773

and

P0C569

is

the smallestSlide7

Unrooted phylogenetic tree (1)

The

lengths

of the branches in the plot of the tree are proportional to the

amount of evolutionary change

(estimated

by number

of mutations) along the

tree branches

This is an 

unrooted

 

phylogenetic tree

as it does not contain an 

outgroup

 sequence, that is a sequence of a protein that

is known to be more distantly related

to the other proteins in the tree than they are to each other.Slide8

Unrooted phylogenetic tree(2)

As a result, we cannot tell which direction evolutionary time ran in along the internal branches of the tree. For example, we cannot tell whether the node representing the

common ancestor

of (O56773, P0C569) was an ancestor of the node representing the common ancestor of (Q5VKP1, P06747), or the other way around.Slide9

Distance matrix

Inspecting calculated distance matrix between aligned sequences confirms results seen in phylogenetic tree

Closest pair is

O56773

and

P0C559

proteins

 

Q5VKP1

P06747

P0C569

P06747

0.49

 

 

P0C569

0.48

0.45

 

O56773

0.50

0.46

0.41Slide10

Rooted phylogenetic tree

In order to convert the

unrooted

tree into a

rooted

tree, we need to add an

outgroup

sequence

Outgroup

a

taxon outside the group of

interestwill branch off at the base of phylogenyCaenorhabditis elegans (UniProt accession Q10572 and  Caenorhabditis remanei (UniProt

E3M2K8)If we were to build a phylogenetic tree of the Fox-1 homologues in verterbrates, the distantly related sequence from

worms would

probably be a good choice of

outgroup

, since the protein is from a different

taxa/group (worms) Slide11

Building an rooted

phylogenetic tree

(1)

#BUILDIN ROOTED TREE OF PROTEIN SEQUNCES (FOX1)

#Q9NWB1 - Human

#Q17QD3 - Cow

#Q95KI0 - Monkey

#A1A5R1 - Rat

#Q10572 - Worm

C.elegans

(Root)

#E1G4K8 - Eye worm

seqnames

<- c("Q9NWB1","Q17QD3","Q95KI0","A1A5R1","Q10572","E1G4K8")

choosebank

("

swissprot

") #selects database for query

seqs

=list()

for(

i

in 1:length(

seqnames

)){

query <- query(paste("AC=",

seqnames

[

i

],

sep

=""))

seqs

[

i

]=

getSequence

(query)

}

alignment_ape

<-

multipleSeqAlignment

(

seqnames

,

seqs

);

mydist

<-

dist.alignment

(

alignment_ape

)Slide12

Building an rooted phylogenetic tree

(2)

library("ape")

mytree

<-

nj

(

mydist

)

mytree$tip.label

=c("E1G4K8-Eye worm ", "Q10572-C.elegans(Root)", "A1A5R1-Rat", "Q9NWB1-Human", "Q17QD3-Cow", "Q95KI0-Monkey")

myrootedtree

<- root(

mytree

,

outgroup

="Q10572-C.elegans(Root)", r=TRUE)

#Phylogenetic tree with 6 tips and 5 internal nodes.

#Tip labels:

#[1] "E1G4K8" "Q8WS01" "Q9VT99" "A8NSK3" "Q10572" "E3M2K8"

#Rooted; includes branch lengths.

plot.phylo

(

myrootedtree

,

edge.color

= "blue",

edge.width

= 3 , type="p")Slide13

Rooted tree of FOX1 proteins

The invertebrates are grouped together

Worms form a distinct group yet with large genetic distance

Human

FOX1

is closest to monkey and cow sequences

outgroup

(

worms)Slide14

Distance matrix

 

E1G4K8

Q10572

A1A5R1

Q9NWB1

Q17QD3

Q10572

0.72

 

 

 

 

A1A5R1

0.75

0.63

 

 

 

Q9NWB1

0.72

0.62

0.44

 

 

Q17QD3

0.73

0.62

0.50

0.28

 

Q95KI0

0.73

0.61

0.49

0.28

0.14

As expected, eye worms are the mostly distantly related species to vertebrates

Cow and monkey have the closest relationship and the lowest genetic distance

Table legend:

Q9NWB1 –

Human

Q95KI0 –

Monkey

Q10572 -

Worm

C.elegans

(Root)

Q17QD3 –

Cow

A1A5R1 –

Rat

E1G4K8 -

Eye wormSlide15

Rooted tree

Time

runs from left to right

Monkey, Cow and Human have common ancestor 3

Ancestor 1 is common to ancestors 2 and 3

TIMESlide16

Exercises on phylogenetic tree building

Q1

.

Calculate the genetic distances

(i.e. genetic distance) between the following NS1 proteins from different Dengue virus strains: Dengue virus 1 NS1 protein (

Uniprot

ID:

Q9YRR4

), Dengue virus 2 NS1 protein (

UniProt

:

Q9YP96

), Dengue virus 3 NS1 protein (UniProt: B0LSS3), and Dengue virus 4 NS1 protein (UniProt: Q6TFL5). Which viruses are the most closely related, and which are the least closely related, based on the genetic distances? Note: Dengue virus causes Dengue fever, which is classified by the WHO as a neglected tropical disease. There are four main types of Dengue virus, Dengue virus 1, Dengue virus 2, Dengue virus 3, and Dengue virus 4

.Q2. Build an unrooted phylogenetic tree of the NS1 proteins from Dengue virus 1, Dengue virus 2, Dengue virus 3 and Dengue virus 4, using the neighbour-joining algorithm. Which are the most closely related proteins

, based on the tree?Slide17

Q3. The Zika virus is related to Dengue viruses, but is not a Dengue virus, and so therefore can be used as an

outgroup

in phylogenetic trees of Dengue virus sequences.

UniProt

accession

Q32ZE1

consists of a sequence with similarity to the Dengue NS1 protein, so seems to be a related protein from

Zika

virus. Build a rooted phylogenetic tree of the Dengue NS1 proteins based on an alignment, using the

Zika

virus protein as the

outgroup

. Which are the most closely related Dengue virus proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?

Exercises on phylogenetic tree buildingSlide18

Answers

Question 1:

Summary

of viral proteins and

Uniprot

accession numbers:

Uniprot

ID:

Q9YRR4

Dengue virus 1 NS1 protein

UniProt

:

Q9YP96 Dengue virus 2 NS1 proteinUniProt: B0LSS3 Dengue virus 3 NS1 protein UniProt: Q6TFL5 Dengue virus 4 NS1 protein seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5")

choosebank("swissprot

") #selects database for query

seqs

=list()

for(

i

in 1:length(

seqnames

)){

query <- query(paste("AC=",

seqnames

[

i

],

sep

=""))

seqs

[

i

]=

getSequence

(query)

}

alignment_ape

<-

multipleSeqAlignment(seqnames, seqs);mydist <- dist.alignment(alignment_ape);mydistSlide19

Answers

Q1. The distance matrix is as follows

The most distant are Q9YP96(V2) and Q6TFL5(V4) with genetic distance of 0,33 while the most closely related are Q9YP96(V1) and BOLSS3(V3) with genetic distance of 0,227

Q6TFL5

Q9YRR4

Q9YP96

Q9YRR4

0.306

 

 

Q9YP96

0.333

0.254

 

B0LSS3

0.297

0.230

0.227Slide20

Answers

Question

2:

library

("ape")

mytree

<-

nj

(

mydist

)

#plotting

unrooted

tree

plot.phylo

(

mytree,type

="u",

edge.color

= "blue",

edge.width

= 3,

cex

=1.2,

no.margin

=T,

srt

=0

)

#clean the sequences from gaps

seqs_trim

=

seqs

for(

i

in 1:length(

seqs

)){

start=regexpr("DMGY", paste(seqs_trim[[i]],collapse="") ) [1] stop=regexpr("GEDG", paste(seqs_trim[[i]],collapse="") ) [1] seqs_trim[[i]]=seqs_trim[[i]][start:stop]}alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);mydist <- dist.alignment(alignment_ape);mydistlibrary("ape")mytree <- nj(mydist)#plotting unrooted tree based on alignment of whole protein sequencesplot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)Slide21

Question 2 (continued):

alignment_ape

<-

multipleSeqAlignment

(

seqnames

,

seqs_trim

);

mydist

<-

dist.alignment

(

alignment_ape

);

mydist

library("ape")

mytree

<-

nj

(

mydist

)

#

tree based on the best aligned portion

plot.phylo

(

mytree,type

="u",

edge.color

= "blue",

edge.width

= 3,

cex

=1.2,

no.margin

=T,

srt=0)AnswersSlide22

Answers

T

he resulting Q2 un-rooted tree

This un-rooted tree agrees with the genetic distance matrix calculated in Q1. The tree suggests that

BOLSS3 and Q9YP96 are the mostly related proteins.

To improve quality of the tree it is best to select region that has

minimal number of gaps

between protein sequences

Below you can see that there are regions with lots of gaps. Let’s build another tree based on the bolded(most conserved) region to see if it is the same

Q6TFL5

DMGCVVSWNGKELKC…KDQKAVHA

DMGYWIESSKNQTWQIEKASLIEVKTCLWPKTHTL…GMEI

RPLSEKEENMVKSQVTA

Q9YRR4

------------------------

DMGYWIESEKNETWKLARASFIEVKTCIWPKSHTL…GMEI

-----------------

Q9YP96

DSGCVVSWKNKELKC…KDNRAVHA

DMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTL

GMEI

RPLKEKEENLVNSLVTA

B0LSS3

--------------------ASHA

DMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTL

…------------------------

Alignment of

proteins:

Built using the full lengths of proteinsSlide23

Answers

The resulting tree looks the same but we had achieved overall better resolution between proteins

 

Q6TFL5

Q9YRR4

Q9YP96

Q9YRR4

0.317

 

 

Q9YP96

0.317

0.264

 

B0LSS3

0.292

0.233

0.216

Built using the

bolded region

Whole

protein sequences used

Best aligned

portion of

protein sequences used

Q6TFL5

Q9YRR4

Q9YP96

Q9YRR4

0.306

Q9YP96

0.332

0.254

B0LSS3

0.297

0.230

0.227Slide24

Answers

Question 3

:

#Q3 building rooted tree based on Q89277 (yellow fever virus) as out group

library("

seqinr

")

library("muscle")

library("ape")

seqnames

<- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5", "Q89277")

choosebank

("

swissprot

") #selects database for query

seqs

=list()

for(

i

in 1:length(

seqnames

)){

query <- query(paste("AC=",

seqnames

[

i

],

sep

=""))

seqs

[

i

]=

getSequence

(query)

}

alignment_ape

<-

multipleSeqAlignment(seqnames, seqs);mydist <- dist.alignment(alignment_ape);mydistlibrary("ape")mytree <- nj(mydist)myrootedtree <- root(mytree, outgroup="Q89277", r=TRUE)plot.phylo(myrootedtree ,type="p", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)Slide25

Answers

Q3 asks to build a rooted tree using out-group yellow fever virus (

Q89277

)

Most closely related viruses:

BOLSS3 and Q9YP96

This

rooted

tree tells you which of the Dengue virus NS1 proteins branched off the earliest from the

ancestors.

Unrooted

tree does not provide ancestry information (i.e. time sequence)

 

Q89277

Q6TFL5

Q9YRR4

Q9YP96

Q6TFL5

0.523

 

 

 

Q9YRR4

0.511

0.306

 

 

Q9YP96

0.486

0.333

0.254

 

B0LSS3

0.487

0.297

0.230

0.227

outgroupSlide26

References

Ape library for phylogenetic trees and ancestry with bootstrap methods

http://cran.r-project.org/web/packages/ape/ape.pdf