sequence alignments Kyrylo Bessonov November 26th 2013 Talk plan How to build phylogenetic trees of types Unrooted Rooted Context comparison of viral proteins of dengue virus Examples on phylogenetic tree building ID: 475044
Download Presentation The PPT/PDF document "Practical on phylogenetic trees based on" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Practical on phylogenetic trees based on sequence alignments
Kyrylo
Bessonov
November 26th, 2013Slide2
Talk plan
How to build phylogenetic trees of types
Unrooted
Rooted
Context
comparison of viral proteins of dengue virus
Examples on phylogenetic tree building
Dengue virusSlide3
Building a phylo tree using ape
Ape -
A
nalyses of
P
hylogenetics
and
E
volution
Functions to create and manipulate
phylo
trees
Graphical exploration of phylogenetic data
To build a phylogenetic tree
Download protein sequences from DB
Align sequences
Calculate pairwise distance using
ape
Visualize a phylogenetic treeSlide4
Building an unrooted
phylogenetic tree (1)
#install req. libraries
install.packages
("
seqinr
")
install.packages
("muscle")
install.packages
("ape")
library("
seqinr
")
library("muscle")
library("ape
")
multipleSeqAlignment
<-
function
(
seqnames
,
seqs
){
#
umax
is an object of class
fasta
from muscle package
fasta_seqs_Object
=
umax
;
tmp
=
data.frame
(V1=rep(0,length(
seqs
)),V2=rep(0,length(
seqs
)))
for(
i
in 1:length(
seqs
)){
tmp
[i,1]=
seqnames
[
i
]
tmp
[i,2]=paste(
seqs
[[
i
]],collapse="")
}
fasta_seqs_Object$seqs
=
tmp
#multiple sequence alignment
#remove conflicting
ape library from
the memory
try(detach("
package:ape
"), silent=T)
alignment=muscle(
seqs
=
fasta_seqs_Object
, out = NULL)
alignment_ape
=ape::
as.alignment
(matrix(
alignment$seqs
[,2]))
alignment_ape$nam
=
alignment$seqs
[,1]
return (
alignment_ape
)
}Slide5
Building an unrooted phylogenetic tree
(2)
#main part of the code
choosebank
("
swissprot
") #selects database for query
seqnames
<- c("P06747", "P0C569", "O56773", "Q5VKP1")
seqs
=list()
for(
i
in 1:length(
seqnames
)){
query <- query(paste("AC=",
seqnames
[
i
],
sep
=""))
seqs
[
i
]=
getSequence
(query)
}
#
multipleSeqAlignment
() is defined on previous slide
alignment_ape
<-
multipleSeqAlignment
(
seqnames
,
seqs
);
mydist
<-
dist.alignment
(
alignment_ape
)
#
nj
() performs
the neighbor-joining tree estimation by Saitou and
Nei
mytree
<-
nj
(
mydist
)
mytree$tip.label
=c("Q5VKP1-\
nWestern
Caucasian bat virus\nphosphoprotein
","
P06747-\
nrabies
virus\nphosphoprotein
","P0C569-\
nMokola
virus\nphosphoprotein","O56773-\
nLagos
bat virus\
nphosphoprotein
")
plot.phylo
(
mytree,type
="u",
edge.color
= "blue",
edge.width
= 3,
cex
=0.8,
no.margin
=T,
srt
=50)Slide6
Unrooted Phylogenetic Tree
Phylogenetic tree showing distance between 4 protein viral sequences
the genetic distance between
O56773
and
P0C569
is
the smallestSlide7
Unrooted phylogenetic tree (1)
The
lengths
of the branches in the plot of the tree are proportional to the
amount of evolutionary change
(estimated
by number
of mutations) along the
tree branches
This is an
unrooted
phylogenetic tree
as it does not contain an
outgroup
sequence, that is a sequence of a protein that
is known to be more distantly related
to the other proteins in the tree than they are to each other.Slide8
Unrooted phylogenetic tree(2)
As a result, we cannot tell which direction evolutionary time ran in along the internal branches of the tree. For example, we cannot tell whether the node representing the
common ancestor
of (O56773, P0C569) was an ancestor of the node representing the common ancestor of (Q5VKP1, P06747), or the other way around.Slide9
Distance matrix
Inspecting calculated distance matrix between aligned sequences confirms results seen in phylogenetic tree
Closest pair is
O56773
and
P0C559
proteins
Q5VKP1
P06747
P0C569
P06747
0.49
P0C569
0.48
0.45
O56773
0.50
0.46
0.41Slide10
Rooted phylogenetic tree
In order to convert the
unrooted
tree into a
rooted
tree, we need to add an
outgroup
sequence
Outgroup
a
taxon outside the group of
interestwill branch off at the base of phylogenyCaenorhabditis elegans (UniProt accession Q10572 and Caenorhabditis remanei (UniProt
E3M2K8)If we were to build a phylogenetic tree of the Fox-1 homologues in verterbrates, the distantly related sequence from
worms would
probably be a good choice of
outgroup
, since the protein is from a different
taxa/group (worms) Slide11
Building an rooted
phylogenetic tree
(1)
#BUILDIN ROOTED TREE OF PROTEIN SEQUNCES (FOX1)
#Q9NWB1 - Human
#Q17QD3 - Cow
#Q95KI0 - Monkey
#A1A5R1 - Rat
#Q10572 - Worm
C.elegans
(Root)
#E1G4K8 - Eye worm
seqnames
<- c("Q9NWB1","Q17QD3","Q95KI0","A1A5R1","Q10572","E1G4K8")
choosebank
("
swissprot
") #selects database for query
seqs
=list()
for(
i
in 1:length(
seqnames
)){
query <- query(paste("AC=",
seqnames
[
i
],
sep
=""))
seqs
[
i
]=
getSequence
(query)
}
alignment_ape
<-
multipleSeqAlignment
(
seqnames
,
seqs
);
mydist
<-
dist.alignment
(
alignment_ape
)Slide12
Building an rooted phylogenetic tree
(2)
library("ape")
mytree
<-
nj
(
mydist
)
mytree$tip.label
=c("E1G4K8-Eye worm ", "Q10572-C.elegans(Root)", "A1A5R1-Rat", "Q9NWB1-Human", "Q17QD3-Cow", "Q95KI0-Monkey")
myrootedtree
<- root(
mytree
,
outgroup
="Q10572-C.elegans(Root)", r=TRUE)
#Phylogenetic tree with 6 tips and 5 internal nodes.
#Tip labels:
#[1] "E1G4K8" "Q8WS01" "Q9VT99" "A8NSK3" "Q10572" "E3M2K8"
#Rooted; includes branch lengths.
plot.phylo
(
myrootedtree
,
edge.color
= "blue",
edge.width
= 3 , type="p")Slide13
Rooted tree of FOX1 proteins
The invertebrates are grouped together
Worms form a distinct group yet with large genetic distance
Human
FOX1
is closest to monkey and cow sequences
outgroup
(
worms)Slide14
Distance matrix
E1G4K8
Q10572
A1A5R1
Q9NWB1
Q17QD3
Q10572
0.72
A1A5R1
0.75
0.63
Q9NWB1
0.72
0.62
0.44
Q17QD3
0.73
0.62
0.50
0.28
Q95KI0
0.73
0.61
0.49
0.28
0.14
As expected, eye worms are the mostly distantly related species to vertebrates
Cow and monkey have the closest relationship and the lowest genetic distance
Table legend:
Q9NWB1 –
Human
Q95KI0 –
Monkey
Q10572 -
Worm
C.elegans
(Root)
Q17QD3 –
Cow
A1A5R1 –
Rat
E1G4K8 -
Eye wormSlide15
Rooted tree
Time
runs from left to right
Monkey, Cow and Human have common ancestor 3
Ancestor 1 is common to ancestors 2 and 3
TIMESlide16
Exercises on phylogenetic tree building
Q1
.
Calculate the genetic distances
(i.e. genetic distance) between the following NS1 proteins from different Dengue virus strains: Dengue virus 1 NS1 protein (
Uniprot
ID:
Q9YRR4
), Dengue virus 2 NS1 protein (
UniProt
:
Q9YP96
), Dengue virus 3 NS1 protein (UniProt: B0LSS3), and Dengue virus 4 NS1 protein (UniProt: Q6TFL5). Which viruses are the most closely related, and which are the least closely related, based on the genetic distances? Note: Dengue virus causes Dengue fever, which is classified by the WHO as a neglected tropical disease. There are four main types of Dengue virus, Dengue virus 1, Dengue virus 2, Dengue virus 3, and Dengue virus 4
.Q2. Build an unrooted phylogenetic tree of the NS1 proteins from Dengue virus 1, Dengue virus 2, Dengue virus 3 and Dengue virus 4, using the neighbour-joining algorithm. Which are the most closely related proteins
, based on the tree?Slide17
Q3. The Zika virus is related to Dengue viruses, but is not a Dengue virus, and so therefore can be used as an
outgroup
in phylogenetic trees of Dengue virus sequences.
UniProt
accession
Q32ZE1
consists of a sequence with similarity to the Dengue NS1 protein, so seems to be a related protein from
Zika
virus. Build a rooted phylogenetic tree of the Dengue NS1 proteins based on an alignment, using the
Zika
virus protein as the
outgroup
. Which are the most closely related Dengue virus proteins, based on the tree? What extra information does this tree tell you, compared to the unrooted tree in Q2?
Exercises on phylogenetic tree buildingSlide18
Answers
Question 1:
Summary
of viral proteins and
Uniprot
accession numbers:
Uniprot
ID:
Q9YRR4
Dengue virus 1 NS1 protein
UniProt
:
Q9YP96 Dengue virus 2 NS1 proteinUniProt: B0LSS3 Dengue virus 3 NS1 protein UniProt: Q6TFL5 Dengue virus 4 NS1 protein seqnames <- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5")
choosebank("swissprot
") #selects database for query
seqs
=list()
for(
i
in 1:length(
seqnames
)){
query <- query(paste("AC=",
seqnames
[
i
],
sep
=""))
seqs
[
i
]=
getSequence
(query)
}
alignment_ape
<-
multipleSeqAlignment(seqnames, seqs);mydist <- dist.alignment(alignment_ape);mydistSlide19
Answers
Q1. The distance matrix is as follows
The most distant are Q9YP96(V2) and Q6TFL5(V4) with genetic distance of 0,33 while the most closely related are Q9YP96(V1) and BOLSS3(V3) with genetic distance of 0,227
Q6TFL5
Q9YRR4
Q9YP96
Q9YRR4
0.306
Q9YP96
0.333
0.254
B0LSS3
0.297
0.230
0.227Slide20
Answers
Question
2:
library
("ape")
mytree
<-
nj
(
mydist
)
#plotting
unrooted
tree
plot.phylo
(
mytree,type
="u",
edge.color
= "blue",
edge.width
= 3,
cex
=1.2,
no.margin
=T,
srt
=0
)
#clean the sequences from gaps
seqs_trim
=
seqs
for(
i
in 1:length(
seqs
)){
start=regexpr("DMGY", paste(seqs_trim[[i]],collapse="") ) [1] stop=regexpr("GEDG", paste(seqs_trim[[i]],collapse="") ) [1] seqs_trim[[i]]=seqs_trim[[i]][start:stop]}alignment_ape <- multipleSeqAlignment(seqnames, seqs_trim);mydist <- dist.alignment(alignment_ape);mydistlibrary("ape")mytree <- nj(mydist)#plotting unrooted tree based on alignment of whole protein sequencesplot.phylo(mytree,type="u", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)Slide21
Question 2 (continued):
alignment_ape
<-
multipleSeqAlignment
(
seqnames
,
seqs_trim
);
mydist
<-
dist.alignment
(
alignment_ape
);
mydist
library("ape")
mytree
<-
nj
(
mydist
)
#
tree based on the best aligned portion
plot.phylo
(
mytree,type
="u",
edge.color
= "blue",
edge.width
= 3,
cex
=1.2,
no.margin
=T,
srt=0)AnswersSlide22
Answers
T
he resulting Q2 un-rooted tree
This un-rooted tree agrees with the genetic distance matrix calculated in Q1. The tree suggests that
BOLSS3 and Q9YP96 are the mostly related proteins.
To improve quality of the tree it is best to select region that has
minimal number of gaps
between protein sequences
Below you can see that there are regions with lots of gaps. Let’s build another tree based on the bolded(most conserved) region to see if it is the same
Q6TFL5
DMGCVVSWNGKELKC…KDQKAVHA
DMGYWIESSKNQTWQIEKASLIEVKTCLWPKTHTL…GMEI
RPLSEKEENMVKSQVTA
Q9YRR4
------------------------
DMGYWIESEKNETWKLARASFIEVKTCIWPKSHTL…GMEI
-----------------
Q9YP96
DSGCVVSWKNKELKC…KDNRAVHA
DMGYWIESALNDTWKIEKASFIEVKNCHWPKSHTL
…
GMEI
RPLKEKEENLVNSLVTA
B0LSS3
--------------------ASHA
DMGYWIESQKNGSWKLEKASLIEVKTCTWPKSHTL
…------------------------
Alignment of
proteins:
Built using the full lengths of proteinsSlide23
Answers
The resulting tree looks the same but we had achieved overall better resolution between proteins
Q6TFL5
Q9YRR4
Q9YP96
Q9YRR4
0.317
Q9YP96
0.317
0.264
B0LSS3
0.292
0.233
0.216
Built using the
bolded region
Whole
protein sequences used
Best aligned
portion of
protein sequences used
Q6TFL5
Q9YRR4
Q9YP96
Q9YRR4
0.306
Q9YP96
0.332
0.254
B0LSS3
0.297
0.230
0.227Slide24
Answers
Question 3
:
#Q3 building rooted tree based on Q89277 (yellow fever virus) as out group
library("
seqinr
")
library("muscle")
library("ape")
seqnames
<- c("Q9YRR4","Q9YP96","B0LSS3","Q6TFL5", "Q89277")
choosebank
("
swissprot
") #selects database for query
seqs
=list()
for(
i
in 1:length(
seqnames
)){
query <- query(paste("AC=",
seqnames
[
i
],
sep
=""))
seqs
[
i
]=
getSequence
(query)
}
alignment_ape
<-
multipleSeqAlignment(seqnames, seqs);mydist <- dist.alignment(alignment_ape);mydistlibrary("ape")mytree <- nj(mydist)myrootedtree <- root(mytree, outgroup="Q89277", r=TRUE)plot.phylo(myrootedtree ,type="p", edge.color = "blue", edge.width = 3, cex=1.2, no.margin=T, srt=0)Slide25
Answers
Q3 asks to build a rooted tree using out-group yellow fever virus (
Q89277
)
Most closely related viruses:
BOLSS3 and Q9YP96
This
rooted
tree tells you which of the Dengue virus NS1 proteins branched off the earliest from the
ancestors.
Unrooted
tree does not provide ancestry information (i.e. time sequence)
Q89277
Q6TFL5
Q9YRR4
Q9YP96
Q6TFL5
0.523
Q9YRR4
0.511
0.306
Q9YP96
0.486
0.333
0.254
B0LSS3
0.487
0.297
0.230
0.227
outgroupSlide26
References
Ape library for phylogenetic trees and ancestry with bootstrap methods
http://cran.r-project.org/web/packages/ape/ape.pdf