Why do we want to know protein structure Classification Functional Prediction What is protein structure Primary chains of amino acids Secondary interaction between groups of amino acids Tertiary the organization in three dimensions of all the atoms in a polypeptide ID: 143351
Download Presentation The PPT/PDF document "Protein Structure Prediction" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Protein Structure PredictionSlide2
Why do we want to know protein structure?
Classification
Functional PredictionSlide3
What is protein structure?
Primary - chains of amino acids
Secondary - interaction between groups of amino acids
Tertiary - the organization in three dimensions of all the atoms in a polypeptide
Quaternary - the conformation assumed by a multimeric proteinSlide4
Proteins are chains of amino acids joined by peptide bonds
The N-C
-C sequence is repeated throughout the protein, forming the backbone
The bonds on each side of the C
atom are free to rotate within spatial constrains,
the angles of these bonds determine the conformation of the protein backbone
The R side chains also play an important structural role
Polypeptide chain
The structure of two amid acids
Primary StructureSlide5
Interactions that occur between the C=O and N-H groups on amino acids
Much of the protein core comprises
helices and
sheets, folded into a three-dimensional configuration:
- regular patterns of H bonds are formed between neighboring amino acids
- the amino acids have similar angles
- the formation of these structures neutralizes the polar groups on each amino acid
- the secondary structures are tightly packed in a hydrophobic environment
- Each R side group has a limited volume to occupy and a limited number of interactions
with other R side groups
helix
sheet
Secondary StructureSlide6
helix
sheet
Secondary StructureSlide7
Other Secondary structure elements
(no standardized classification)
- loop
- random coil
- others (e.g. 3
10
helix, -hairpin, paperclip)
Super-secondary structure
- In addition to secondary structure elements that apply to all proteins
(e.g. helix, sheet) there are some simple structural motifs in some proteins
- These super-secondary structures (e.g. transmembrane domains, coiled
coils, helix-turn-helix, signal peptides) can give important hints about
protein function
Secondary StructureSlide8
Structural classification of proteins (SCOP)
Class 1:
mainly alpha
Class 4:
few secondary structures
Class 2:
mainly beta
Class 3:
alpha/beta
ClassificationSlide9
Alternative SCOP
Class
:
only
helices
Class
:
antiparallel
sheets
Class
/
: mainly
sheetswith intervening
helices
Class
+
:
mainly
segregated
helices with
antiparallel
sheets
Membrane structure:
hydrophobic
helices with
membrane bilayers
Multidomain:
contain
more than one class
More ClassificationSlide10
Q: If we have all the Psi and Phi angles in a protein, do we then have enough
information to describe the 3-D structure?
Tertiary structure
A: No, because the detailed packing of the amino acid side chains is not
revealed from this information. However, the Psi and Phi angles do
determine the entire secondary structure of a protein
Protein Structure ReviewSlide11
Secondary-Structure Prediction Programs
* PSI-
pred
*
JPRED
Consensus prediction (includes many of the methods given
below)
* DSC * PREDATOR
* PHD * ZPRED
* nnPredict * BMERC PSA
* SSPSlide12
The
tertiary structure
describes the organization in three dimensions of all the atoms in the polypeptide
The tertiary structure is determined by a combination of different types of bonding (covalent bonds, ionic bonds, h-bonding, hydrophobic interactions, Van der Waal’s forces) between the side chains
Many of these bonds are very week and easy to break, but hundreds or thousands working together give the protein structure great stability
If a protein consists of only one polypeptide chain, this level then describes the complete structure
Tertiary StructureSlide13
Proteins can be divided into two general classes based on their tertiary structure:
-
Fibrous proteins
have elongated structure with the polypeptide chains arranged
in long strands. This class of proteins serves as major structural component of cells
Examples: silk, keratin, collagen
-
Globular proteins
have more
compact, often irregular structures.
This class of proteins includes most
enzymes and most proteins involved
in gene expression and regulation
Tertiary StructureSlide14
The
quaternary
structure
defines the conformation assumed by a multimeric protein.
The individual polypeptide chains that make up a multimeric protein are often referred to
as
protein subunits
. Subunits are joined by ionic, H and hydrophobic interactions
Example:
Haemoglobin
(4 subunits)
Quaternary StructuresSlide15
Common displays are (among others)
cartoon
,
spacefill
, and
backbone
cartoon
spacefill
backbone
Structure DisplaysSlide16
Software
RasMol
Cn3D
Jmol (Chime)Slide17
Classic Approach to Determining Structure?
Determine
biochemical
and cellular
role of protein
Purify protein
Experimentally
determine
3D structure
Clone cDNA
encoding
protein
Obtain protein
By expression
Infer function,
mechanism of
actionSlide18
Structural Genomics Approach?
genomic
DNA
sequences
predict
protein-
coding
genes
Obtain protein
by expression
Obtain protein
In silico
Experimentally
determine
3D structure
Predict
3D structure
Determinebiochemical
andcellular roleof protein
homology searches (PSI-BLAST)Slide19
3-D macromolecular structures stored in databases
The most important database: the Protein Data Bank (PDB)
The PDB is maintained by the Research Collaboratory for Structural Bioinformatics (RCSB) and can be accessed at three different sites (plus a number of mirror sites outside the USA):
- http://rcsb.rutgers.edu/pdb (Rutgers University)
- http://www.rcsb.org/pdb/ (San Diego Supercomputer Center)
- http://tcsb.nist.gov/pdb/ (National Institute for Standards and Technology)
It is the very first “bioinformatics” database ever build
Sources of Protein Structure Information?Slide20
Researches have been working for decades to develop procedures for predicting protein structure that are not so time consuming and not hindered by size and solubility constrains.
As protein sequences are encoded in DNA,
in
principle
, it should therefore be possible to translate a gene sequence into an amino acid sequence, and to
predict the three-dimensional structure of the resulting chain from this amino acid sequence
Computational Modeling
Structural PredictionSlide21
How to predict the protein structure?
Ab initio
prediction of protein structure from sequence:
not yet
.
Problem: the information contained in protein structures lies essentially in the
conformational torsion angles. Even if we only assume that every amino-acid residue
has three such torsion angles, and that each of these three can only assume one
of three "ideal" values (e.g., 60, 180 and -60 degrees), this still leaves us with 27
possible conformations per residue.
For a typical 200-amino acid protein, this would give 27
200
(roughly 1.87 x 10
286)possible conformations!
If we were able to evaluate 10
9
conformations per second, this would still keep us busy 4 x 10259 times the current age of the universe
There are optimized
ab initio
prediction algorithms available as well as fold recognition algorithms that use
threading
(compares protein folds with know fold structures from databases), but the results are still
very poor
Q: Can’t we just generate all these
conformations, calculate their energy
and see which conformation has the
lowest energy?
Computational ModelingSlide22
Homology (comparative) modeling
attempts to predict structure on
the strength of a protein’s sequence similarity to another protein of known
structure
Basic idea:
a significant alignment of the query sequence with a target sequence from PDB is evidence that the query sequence has a similar 3-D structure (current threshold ~ 40% sequence identity). Then multiple sequence alignment and pattern analysis can be used to predict the structure of the protein
Homology ModelingSlide23
Computational modeling: summary
Partial or full sequences
predicted through gene
finding
Similarity search
against proteins
in PDB
Alignment can be used to position the
amino acids of the query sequence in
the same approximate 3-D structure
Find structures that have a significant
level of structural similarity (but not
necessarily significant sequence similarity)
If member of a family with a
predicted structural fold,
multiple alignment can be used
for structural modeling
Infer structural information (e.g.
presence of smallamino acid motifs; spacing and arrangement of
amino acids;
certain typical
amino acid combinations
associated with certain types of secondary structure)
can provide clues as to the presence of active sites and
regions of secondary structure
Structural analyses in the lab
(X-ray crystallography, NMR)
How do we
do this?Slide24
3D Comparative Modeling
Profile Methods - match sequences to folds by describing each fold in terms of the environment of each residue in the structure
Threading Methods - match sequences to structure by considering pairwise interactions for each residue, rather than averaging them into an environmental class
HMM Methods - the equivalent state corresponds to one structurally aligned position in a structural fold, including gapsSlide25
Structural HMM