BMICS 776 wwwbiostatwiscedubmi776 Spring 2018 Anthony Gitter gitterbiostatwiscedu These slides excluding thirdparty material are licensed under CC BYNC 40 by Mark Craven Colin Dewey and Anthony Gitter ID: 919243
Download Presentation The PPT/PDF document "RNA Secondary Structure Prediction" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
RNA Secondary Structure Prediction
BMI/CS 776 www.biostat.wisc.edu/bmi776/Spring 2018Anthony Gittergitter@biostat.wisc.edu
These slides, excluding third-party material, are licensed
under
CC BY-NC 4.0
by Mark
Craven, Colin Dewey, and Anthony Gitter
Slide2Goals for Lecture
Key conceptsRNA secondary structureSecondary structure features: stems, loops, bulgesPseudoknots
Nussinov
algorithm
Adapting Nussinov to take free energy into account
2
Slide3Why RNA is Interesting
Messenger RNA (mRNA) isn’t the only important class of RNAribosomal RNA (
rRNA
)
ribosomes are complexes that incorporate several RNA subunits in addition to numerous protein unitstransfer RNA (tRNA)
transport amino acids to the ribosome during translation
the
spliceosome, which performs intron splicing, is a complex with several RNA unitsmicroRNAs and others that play regulatory rolesmany viruses (e.g. HIV) have RNA genomesguide RNAsequence complementary determines whether to cleave DNAFolding of an mRNA can be involved in regulating the gene’s expression
3
Slide4RNA Secondary Structure
RNA is typically single strandedFolding, in large part is determined by base-pairingA
-
U
and C-
G
are the canonical base pairs
other bases will sometimes pair, especially G-UBase-paired structure is referred to as the secondary structure of RNARelated RNAs often have homologous secondary structure without significant sequence similarity4
Slide5tRNA Secondary Structure
tertiary structure
Scitable
5
Slide6Small Subunit Ribosomal RNA Secondary Structure
6
Slide76S RNA Secondary Structure
7
Slide8Secondary Structure Features
bulge
internal loop
stem
hairpin loop
8
Slide9Four Key Problems
Predicting RNA secondary structureGiven: RNA sequence
Do
: predict secondary structure that sequence will fold into
Searching for instances of a given structure
Given
: an RNA sequence or its secondary structure
Do: find sequences that will fold into a similar structureModeling a family of RNAsGiven: a set of RNA sequences with similar secondary structureDo: construct a model that captures the secondary structure regularities of the setIdentifying novel RNA genes
Given
: a pair of homologous DNA sequences
Do
: identify subsequences that appear to have highly conserved RNA secondary structure (putative RNA genes)
Focus for today
9
Slide10RNA Folding Assumption
Algorithms
we’ll consider assume that base pairings do not cross
F
or
base-paired positions
i
, i’ and j, j’, with i <
i
’
and
j < j’
, we must have either
i
<
i
’ <
j
<
j
’
or
j
<
j
’ <
i
<
i
’
(not nested)
i
< j < j
’ <
i
’
or
j
<
i
< i’ < j’ (nested)Can’t have i < j < i’ < j’ or j < i < j’ < i’
i
i
’
j
j
’
i
i
’
j
’
j
10
Slide11Figure from
Seliverstov
et al.
BMC Microbiology
, 2005
pseudoknot
Pseudoknots
T
hese
crossings are called
pseudoknots
D
ynamic
programming breaks down if pseudoknots are allowed
F
ortunately
, they are not very
frequent
Modern software does support them
Akiyama et al. 2018
11
Slide12Simplest RNA Secondary Structure Task
Given:An RNA sequenceThe constraint that
pseudoknots
are not allowed
Do:Find a secondary structure for the RNA that maximizes the number of base pairing positions
12
Slide13Predicting RNA Secondary Structure: the Nussinov
Algorithm[Nussinov et al., SIAM Journal of Applied Mathematics 1978]
K
ey
idea:Do this using dynamic programmingstart with small subsequences
progressively work to larger ones
13
Slide14DP in the Nussinov Algorithm
14
G
G
G
A
A
A
U
C
C
G
G
G
A
A
A
U
C
C
j
i
Figure 10.8 from textbook
max # of
paired bases in
subsequence [
i
,
j
]
Slide15DP in the Nussinov Algorithm
LetInitialization:
R
ecursion
max # of
paired bases in
subsequence [
i
,
j
]
15
Slide16Nussinov Algorithm Traceback
16
Slide17Predicting RNA Secondary Structure by Energy Minimization
It’s naïve to predict folding just by maximizing the number of base pairsHowever, we can generalize the key recurrence relation so that we’re
minimizing
free energy instead
case that
i
and
jare base paired
17
Slide18Predicting RNA Secondary Structure by Energy Minimization
A sophisticated program, such as Mfold [Zuker et al.],
can take into account
free energy of the “
local environment” of [i, j]
18
Slide19c
u
c
g
c
a
u
i
j
c
u
c
a
u
g
c
i
j
a
u
g
c
j
-1
i
+
k
+1
c
u
c
a
u
g
c
i
j
a
u
g
c
j
-
l
-1
i
+
k
+1
c
u
c
a
u
c
g
g
c
i
j
i+
1
j-
1
g
c
19
Predicting RNA Secondary Structure by Energy Minimization
Slide2020
Mfold example
GGGAAAUCC
http://unafold.rna.albany.edu/
Δ
G = -0.80 kcal/
mol
Δ
G = 0.20 kcal/
mol