Central dogma of molecular biology Term is due to Francis Crick The conversion DNA protein is not direct RNA is involved DNA is the information store RNA is messenger mRNA transporter tRNA biomolecular nanomachine rRNA ID: 932054
Download Presentation The PPT/PDF document "Bioinformatics Not only small molecules ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Bioinformatics
Slide2Not only small molecules and QM, MM techniques rule the world.
Slide3Central dogma of molecular biology
Term is due to Francis Crick
The conversion DNA
→ protein is not direct, RNA is involved
DNA is the information store, RNA is messenger (mRNA), transporter (tRNA), biomolecular nanomachine (rRNA)
source: wikipedia.org
Slide4Nucleic acidsfour letters (DNA, RNA)
sequence - AACTAACG (5’
→
3’)
DNA – double helixRNA – “single stranded” helix, folding (double helical regions, C2’ -OH → secondary and tertiary motifs)
Slide5nu
c
leosid
e
nu
c
leo
t
id
e
Slide6B-DNA
A-DNA
Z-DNA
B
A
Z
Slide7RNA secondary motifs
Nowakowski and Tinoco
,
Seminars in Virology 8, 153, 1997.
Slide8RNA
source: http://complex.upf.es/~josep/RNA.jpg, http://www.biosci.ki.se/groups/ljo/images/phe_trna_large.jpg, http://rna.ucsc.edu/rnacenter/images/70s_atrna.jpg
Proteins20 lettersprimary structure - sequence AMNTSSTVG (N-end
→
C-end)
Alberts
, Molecular
Biology of
the
Cell, 5th Ed.
Slide10secondary structure (random coil, -helix
,
β
-sheet
,
loops)several secondary structure elements form motifse.g. greek key, β-α-β, HTH
Slide11tertiary structure (the arrangements of motifs into
domain/s
)
quartenary structure
(multimeric complexes)
Slide12Proteins
source:http://calstate.fullerton.edu/news/arts/2003/photos/protein-art.jpg
Slide13Proteins
source: Petsko, Ringe – Protein structure and function
Slide14http://www.cellsignal.com/reference/pathway/NF_kappaB.html
Slide15Systems biologyfocuses on the systematic study of complex interactions in biological systems using a new perspective - holism instead of reductionism
holism
– the properties of a system cannot be determined or explained by its component parts alone
one of the goals of systems biology is to discover new emergent properties
new field, boom since 2000, very little covered in CZ
Slide16Systems biology
source: wikipedia.org
Slide17Systems biologybased on mathematical modelling of systems, control theory, cyberneticsengineering view on complex biological systems
e.g. answers questions about robustness of the given system when one of its part fails
or about response of a systems upon the change of the environmental conditions
Slide18quantum chemistrymolecular dynamics
bioinformatics
systems biology
Slide19Bioinformaticsapplication of information technology to the field of molecular biology, genomics and related biological disciplines
tremendous amount of data
the creation and advancement of databases, algorithms, computational and statistical techniques, and theory to solve problems arising from the management and analysis of biological data
Slide20Podle definičního třídění ruských vědců rozlišujeme dva obory paranormálních jevů:
bioinformatika
a bioenergetika. Bioinformatika (tzn. mimosmyslové vnímání, ESP) zahrnuje získávání a výměnu informací mimosmyslovou cestou (nikoli normálními smyslovými orgány). V podstatě rozlišujeme následující formy bioinformace: hypnózu (kontrolu vědomí), telepatii, dálkové vnímání, prekognici, retrokognici, mimotělní zkušenost, "vidění" rukama nebo jinými částmi těla, inspiraci a zjevení.
zdroj: http://www.esoterika.cz/clanek/2992-mimosmyslova_spionaz_dalkove_pozorovani_i_.htm
Slide21Bioinformaticssequence analysis (sequence bioinformatics)structural analysis (structural bioinformatics)
functional analysis (systems biology)
Slide22genetic codegene
genome, genomics
large data sets
high throughput
human genome
DNA localized mainly in nucleus, each nucleus carries the whole genetic information3.2 billions bp25 000 – 30 000 genesca 1,5 % codes for proteins, the rest - junk DNAwhat is proteome?proteomics
Is it more difficult to study genome or proteome?
Slide23Sequential bioinformaticsreconstruction of sequence fragments
searching of genes and other interesting regions in the genome
junk DNA
– 95% of human genome is made by non-coding sequences, either no function, or not yet understood
querying huge genomes for a given sequencecomparison of genes within a specie – similarities between protein functionscomparison of genes between species – organism's evolutionary relationships (phylogenetic analysis)
Slide24Sequence alignmentProcedure of comparing sequences
Point mutations
– easy
More difficult example
However, gaps can be inserted to get something like this
ACGTCTGATACGCCGTATAGTCTATCTACGTCTGATTCGCCCTATCGTCTATCT
AC
G
TC
T
GAT
ACGCCGTAT
AGTCTATCTCTGATTCGC
A
TCGTC
TAT
CT
ACGTCTGAT
ACGCCGTATAGTCTATCT----CTGATTCGC---ATCGTCTATCT
gapless alignmentgapped alignment
insertion × deletion
indel
Slide25Flavors of sequence alignment
pair-wise alignment
×
multiple
sequence alignment
Slide26Flavors of sequence alignment
global alignment
×
local alignment
global
local
align entire sequence
stretches of sequence with the highest density of matches are aligned, generating islands of matches or subalignments in the aligned sequences
Slide27Identity matrix
Scoring systems I
DNA and protein sequences can be aligned so that the number of identically matching pairs is maximized.
Counting the number of matches gives us a score (3 in this case).
Higher score means better alignment
.This procedure can be formalized using substitution matrix.
A T T G - - - T
A –
-
G A C A T
A
T
C
G
A
1
T
0
1
C
0
0
1
G
0
0
0
1
Slide28Scoring systems II
For nucleotide sequences identity matrix is usually good enough.
For protein
sequences,
identity matrix is not sufficient to describe biological and evolutionary proceses.It’s because amino acids are not exchanged with the same probability as can be conceived theoretically.
For example substitution of aspartic acids D by glutamic acid E is frequently observed. And change from aspartic acid to tryptophan W is very rare.Why is that?Triplet-based genetic code GAT (D) → GAA (E), GAT (D) → TGG (W)Both D and E have similar properties, but D and W differ considerably. D is hydrophylic, W is hydrophobic, D → W mutation can greatly alter 3D structure and consequently function.
Slide29Substitution matrices
small, polar
small, nonpolar
polar or acidic
basic
large, hydrophobic
aromatic
Zvelebil, Baum, Understanding bioinformatics
.
Positive score
– frequency of substitutions is greater than would have occurred by random chance.
Zero score
– frequency is equal to that expected by chance.
Negative score
– frequency is less than would have occurred by random chance.
Slide30Sequence database searchBLAST
Google of sequence world
Slide31Phylogenetic analysis
Slide32Structural bioinformaticsthe function of chemical moiety is given by its structure
while DNA structure is “given” (double-helix), RNA and proteins can accommodate very different conformations (i.e. specific arrangements of atoms in 3D space)
structural bioinformatics covers
analysis of the NA and proteins structure
prediction of structure from the sequence
Slide33Protein structure predictionsecondary structure predictionthe conformational state of each residue is predicted as H
(helix), E (extended,
β
-sheet), C (coil)
accuracy: 80%tertiary structure predictionwhy?many sequences are known, not that many 3D structures has been solvedsome proteins (e.g. transmembrane) are difficult to characterize experimentally
many proteins have known function, but unknown structure (which is however needed to understand their behavior in detail)ab initio, threading, homology modelling
Slide34CASPCritical Assessment of Structure Prediction
http://predictioncenter.org/
since 1994, every
2
years, CASP10 in preparationpredict solved, but not publicly released structurescompetition of individual groups in 3D prediction:human groups – answer in 14 days
software (automated prediction) – answer in 48 hours