pombe Budding yeast Saccharomyces pombe sugar fungus Proteins dictate function in an organism What happens as proteins evolve In our project well be determining if functional homologs of ID: 911539
Download Presentation The PPT/PDF document "Fission yeast Schizosaccharomyces" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Fission yeast
Schizosaccharomyces pombe
Budding yeast
Saccharomyces pombe (sugar fungus)
Proteins dictate function in an organism:What happens as proteins evolve?
In our project, we'll be determining if functional homologs of
S. cerevisiae
Met proteins are present in
S. pombe
Slide2This semester: Five genes from
S. pombe
will be transferred to
S. cerevisiae
What organism should the class study after we finish
S. pombe
genes?
A look at the molecular phylogeny should help
Slide3Are there any correlations between the kind of amino acid substitutions observed over evolution with their chemistry?
How are bioinformatics tools used to analyze the conservation of protein sequences? How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?
Slide4For proteins to maintain their function, they don't tolerate drastic changes to their shapes
Amino acid substitutions that significantly perturb the structure of a protein or alter its chemistry can cause the protein to lose function
Met16p from S. cerevisiae
complexed with PAP (2OQ2)
Slide5Recall that the final folded form of a protein is determined by its primary sequence
R (“reactive”) groups form a variety of bonds important for structure and function
Slide6C
ustom view of Met16p highlights Cys
Protein: backbone viewPAP: ball-and-stickCysteine: space-fill
Cys-254 is in close proximity to the end-product, PAP, suggesting that it plays a role in catalysisCysteine is one of the most evolutionarily constrained amino acids
Slide7Glu
(E)
Asp (D)
Acidic
Arg (R) Lys (K)
His (H)
Basic
Charged
Asn
(N)
Gln
(Q)
Polar
Thr
(T)
Gly
(G)
Cys
(C)
Ser (S)
Ala (A)
Small
Neutral
Tyr (Y)
Aromatic
Hydrophobic
Val (V)
Ile (I)
Leu
(L)
Met (M)
Pro (P)
Trp
(W)
Phe
(F)
Amino acids can be grouped according to the chemistry and size of their R groups
Slide8Most amino acids are abbreviated by their first letter:
(Abundant, hydrophobic ones get preference)A Ala alanine
C Cys cysteine
G Gly glycineH H
is histidineI Ile isoleucineL L
eu leucine
M Met methionineP P
ro
proline
S S
er serine
T
T
hr
threonine
V V
al
valine
Phonetic abbreviations:
F
Ph
e
phenylalanine
R
Ar
g
arginine
Oddballs:
(Charged, aromatic, some polar)
D
Asp aspartic acid
E
Glu
glutamic
acid
K
Lys lysine
N
Asn
asparagine
Q
Gln
glutamine
W
Trp tryptophanY Tyr tyrosine
The one letter code needs to be part of a 21
st century biologist’s vocabulary
Slide9Matrix assigns scores for substitutions:
Maximum score for the same amino acid (completely conserved, possibly essential)Positive scores
are awarded for common amino acid substitutions, in decreasing order, based on their occurrence in proteinsNegative scores
are unlikely substitutionsBLOSUM62 (BLOck
SUbstitution Matrix) was based on statistical alignments seen in proteins that are at least 62% identical
Studying the evolutionary conservation of amino acids in sequences provides a sense of the importance of the amino acid to protein function
Note the high score for Cys!
The biochemical connection
:
Higher scores are frequently correlated with conservative amino acid substitutions based on amino acids chemistry and size
Slide10Are there any correlations between the kind of amino acid substitutions observed over evolution with their biochemistry?
How are bioinformatics tools used to analyze the conservation of protein sequences?
How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?
Slide11BLAST
BLAST is an acronym for Basic L
ocal Alignment S
earch Tool, a computer algorithm for finding homologous sequences in databases BLASTN compares
nucleic acid sequences BLASTP compares p
rotein sequences
BLOSUM62 is the default scoring matrix for BLASTP
Slide12P
ij is the observed frequency of two amino acids (i and j) replacing each other in homologous sequences
Qi and Q
j are probabilities of finding i and j randomly in a sequence
Score = k log
10
P
ij
Q
i
*
Q
j
(
)
Scaling factor used to produce integral values
BLOSUM 62 scores relate the frequency of a particular substitution to the probability that it occurs by chance in proteins that are at least 62% identical throughout their length
Slide13Positive and negative scores suggest amino acid changes have been selected for (positive) or against (negative) during evolution
Magnitude of the score suggests the strength of the selection
Score of zero suggests that a particular substitution can be explained by chance alone
Slide14BLASTP begins with a query
sequence (e.g. your MET sequence)
If a target entry has two or more matches to "
words" from the query, the alignment is extended in both directions looking for additional similarity
Word match
Word match
Target sequence
BLAST searches for matches (or synonyms) in
target
entries in the database
Word match
Word match
Target sequence
The
query
sequence is broken into "
words
" that will act as seeds in alignments
Words
Query
Slide15"Words" are integral to the BLASTP search
BLASTP uses a sliding window to identify wordsConsider the sequence:
E A G L E S
BLASTP would break this down into a series of four 3-letter words:
E A G A G L G L E L E S
Tip
!
Use a non-proportional word font such as Courier when working with database entries.
The fonts are uglier, but the letters have a constant spacing that generates nice columns!
Next: words are given a numerical score
Slide16BLASTP uses the BLOSUM62 matrix as its default for assigning values to words
E A G
A G L
G L E L E S
5 + 4 + 6 = 154 + 6 + 4 = 146 + 4 + 5 = 15
4 + 5 + 4 = 13
BLASTP next checks for word
synonyms
(1-letter replacements)with a score greater than a default
threshold of 10
E A G
A G L
G L E
L E S
K A G (11)
E S G (12)
E C G (11)
E T G (11)
E V G (11)
G I E (13)
G L D (12)
G L Q (12)
S G L (11)
A G I (12)
I E S (13)
BLASTP will search for all of these words and synonyms in the protein database
Of the 60 possible synonyms for each word, only a small handful are statistically likely to appear in homologous proteins
Slide17Sequences must have at least two words for further consideration
BLASTP uses word matches as a nucleus and extends them in both directions, looking for additional similarityAs BLASTP extends the alignment out from the match, it calculates a running score – extension stops when the score drops below a threshold value
Penalties are assigned for gaps and mismatches
Plus signs in summary line indicate a positive BLOSUM62 value
Word match
Target sequence
Original search word
Q A S T L Y E - A
G L E
S
E A T T N - - R R E I
+ A + T +
+ +
G L E
S
E A + + R + E +
N A A T Y
W D A S
G L E
S - -
- S Q I I R K E L
Query
Summary
Target
Slide18Are there any correlations between the kind of amino acid substitutions observed over evolution with their biochemistry?
How are bioinformatics tools used to analyze the conservation of protein sequences?
How can I identify regions of proteins that are most strongly conserved and most likely to be important for function?
Slide19Highly conserved protein sequences are often essential for function
You will compare sequences of homologous proteins from model organisms
Escherichia coli
K
-12
(gram negative)
Caenorhabditis
elegans
Mus
musculus
Arabidopsis thaliana
Bacillus
subtilis
str. 168
(gram positive)
Slide20Phylogeny.fr
provides tools for preparing multiple sequence alignments and phylogenetic trees
Slide21Multiple sequence alignments show regions of conservation
Identical amino acids are shown in blue – conservative changes in grey
Slide22Tree
Dyn generates a phylogenetic tree
Bootstrap values
predict reliability of nodes in the tree (max = 1.0)
Length of branches reflects time since divergene from a node
Length corresponds to 600 million years
Slide23Weblogo
program provides a graphical depiction of multiple sequence alignments
Sizes of different amino acids reflects the frequency with which a particular amino acid is found at the position – note the positions of amino acids with high BLOSUM scores