/
Bioinformatics Lecture 1 Bioinformatics Lecture 1

Bioinformatics Lecture 1 - PowerPoint Presentation

Outlawking
Outlawking . @Outlawking
Follow
342 views
Uploaded On 2022-07-28

Bioinformatics Lecture 1 - PPT Presentation

DNA the basics Drew Berry DNA animations http wwwyoutubecom watchv WFCvkkDSfIUampindex 4amplistPL9CBBEA5A85DBCDEF Organisation of DNA DNA is packed in Chromosomes Karyotype chromosome set of a species ID: 930023

genome dna sequencing sequence dna genome sequence sequencing human alignments protein matrices alignment ref similarity sequences similar sections http

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bioinformatics Lecture 1" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Bioinformatics

Lecture 1

Slide2

DNA - the basics

Slide3

Drew Berry – DNA animations

http://

www.youtube.com

/

watch?v

=WFCvkkDSfIU&index=4&list=PL9CBBEA5A85DBCDEF

Slide4

Organisation of DNA

DNA is packed in Chromosomes

Karyotype: chromosome set of a species

Chromosomes are dynamic structures

The Human karyotype:

23 pairs of chromosomes

46 DNA molecules

Slide5

A

C

T G

DNA replication

The ability of DNA to replicate itself is a fundamental driver of life

DNA copy is catalysed by enzymes (DNA polymerases)

The complementary strand is synthesised from a template strand, using

deoxynucleotides

and a primer

Synthesis is directional (5’->3’)

5

5

3

3

Primer

reverse complement

copy

Template

DNA strand

Deoxyribonucleotides

dNTPs

DNA

polymerase

Template

A

C

G

T

TCAG

Slide6

The polymerase chain reaction

Replication requires a DNA polymerase

Thermostable

DNA polymerase (

eg

Taq polymerase)Efficient DNA amplification No error correction

Kary

Mullis

Nobel prize in chemistry: 1993

Melt DNA

(94-98 °)

Anneal primers

(50-65 °)

Elongation

(72 °)

Exponential

replication

Slide7

DNA Sequencing (Sanger)

PCR Reaction is terminated using randomly incorporated

dideoxynucleosides

(

ddNP

)Older methods use radiolabelled phosphateNewer methods use ddNP incorporating dyesTruncated DNA strands are separated on a gel or by

capillary electrophoresis

Slide8

Next Generation Sequencing

Next generation sequencing refers to methods newer than the Sanger approach

A variety of techniques developed by different companies

DNA is generally immobilized on a solid support

Very large numbers of small reads

Multiple reads of a each section of genomic DNA (eg 30x)Assembling the genome becomes a significant computational problem

Some ‘single molecule’ methods do not require PCR (reduces errors)Cost has reduced substantially

the $1000 genome!

Refs: Metzker, M. L. Sequencing Technologies — the Next Generation. Nat. Rev. Genet. 2009, 11, 31–46.

Slide9

The Human Genome Project

Funded by US government

The human genome was published in February 2001

Project completed in 2003

Cost $US 2.7

billion in 1991 dollarsHierarchical shotgun sequencing (genome is broken down into many smaller fragments)Automated Sanger type sequencing

Ref:

http://

www.nature.com

/

scitable/topicpage/dna-sequencing-technologies-key-to-the-human-828

Slide10

Human genome by function

The human genome contains about 21K genes (about 100,000 were expected!)

98% of the human genome is noncoding DNA

Noncoding DNA can code for regulatory RNAs or otherwise regulate transcription

Ref:

Häggström

,

Wikiversity

Journal of Medicine 1 (2).

DOI:

10.15347/wjm/2014.008. ISSN 20018762

Slide11

The

druggable

genome – Current drug targets

Ref: Hopkins

, A. L.; Groom, C. R. The

Druggable

Genome. Nat Rev Drug Discov

2002

,

1, 727–730.

Slide12

The

druggable

genome –

Human genes

Ref: Hopkins

, A. L.; Groom, C. R. The

Druggable Genome. Nat Rev Drug Discov

2002

, 1, 727–730.

Slide13

Human genome resources

Three useful sites

providing a huge number of

resources such as genome browsers

NCBI:

National center of biological informationhttp://www.ncbi.nlm.nih.gov/http://

www.ncbi.nlm.nih.gov/genome/guide/human/UCSC

genome browser

http://genome.ucsc.edu

/

Ensembl: European site at the Sanger centrehttp://www.ensembl.org

Slide14

Next-gen Sequencing Overview

Ref: http://

res.illumina.com

/documents/products/

illumina_sequencing_introduction.pdf

Slide15

Multiple Genomes

Ref

:

McVean

et al. An Integrated Map of Genetic Variation From 1,092 Human Genomes. Nature 2012, 491, 56-65.

Slide16

Bioinformatics

Sequencing technologies produce

enormous

amounts of sequence data. What do we want to do with this?

Identify genes

Identify functions of gene products (proteins)Compare genes between speciesIdentify relationships (similarities) between species

Slide17

The Genetic Code

In general:

Amino acids that share the same biosynthetic pathway tend to have the same first base in their

codons

Amino

acids with similar physical properties

have

similar

codons causing conservative substitutions in the case of mutations or mistranslation

Slide18

Genetic mutation

The genetic code can be changed by a variety of processes

Small scale:

Damage to DNA (radiation or chemical damage)

Translation errors

Large scale:Duplication of sections of DNA Deletion of sections of DNATransposition of sections of DNA

Slide19

The rate of genetic mutation

The mutation rate (per year or per generation) differs between species and even between different sections of the genome

Different types of mutations occur with different frequencies

The average mutation rate

is

estimated to be ~2.5 × 10−8 mutations per nucleotide site or 175 mutations per diploid genome per generationRef: Nachman

, M. W.; Crowell, S. L. Estimate of the Mutation Rate Per Nucleotide in Humans. Genetics, 156, 297 (2000).

Slide20

Amino acid substitution matrices

Substitution matrices describe the probability that one AA is converted to another and ‘accepted’

Matrix is a ‘log odds’ matrix – i.e. here the probability of conversion from

Ala

to

Arg is 1/log(30)

Slide21

PAM and BLOSUM matrices

Scoring

matrices are

used to:

produce sequence

alignments and score similarity between two or more protein to search a database to find sequences similar to a test sequence

Commonly used families of matrices:PAM (Accepted Point Mutation)

matrices (

Dayhof

)Derived from global alignments of entire proteins

Better for closely related protensBLOSUM (BLocks SUbstitution Matrices) matrices (Steven and Henikof)Derived from local alignments of blocks of sequencesBetter for evolutionally divergent sequences

Slide22

BLAST - Searching genomes

BLAST is a

rapid

method for searching protein or DNA sequences in large databases

Sequences are divided into groups

k AAs or BasesPGFHJIQMQVVS 

PGF, GFH, FHJ, HJI, etc (

k

=3)

Common or repeated sequences are discarded

Sections of exact sequence match are searched forThe sequence alignment is expanded from sections that are exact matchesBlast can miss difficult matches

Slide23

http://blast.ncbi.nlm.nih.gov/

Slide24

Slide25

Slide26

Sequence alignment

Protein or DNA sequences can be aligned

Differences between sequences are interpreted as mutations, insertions or deletions

Substitution matrices are used to score the likelihood of a match

Alignment scores are calculated between

pairs of sequencesMultiple alignments can be performedMany alignment programs: Clustal, T-coffee,

Slide27

Clustal

Slide28

Sequence alignments and protein structural similarity

Sequence alignments are based on protein/DNA sequence similarity and

not

on structural similarity

High sequence similarity implies (but does not guarantee) structural similarity

High sequence similarity implies (but does not garuantee) similar protein function

Comparison of RMSD when pairs of similar proteins are superimposed using the sequence alignment (

X axis

) and the protein 3D structures (

Y axis

)Ref: Kosloff, M.; Kolodny

, R. Sequence-Similar, Structure-Dissimilar Protein Pairs in the PDB. Proteins 2008, 71,

891

Slide29

Differences between sequence and structural alignment

Chain

A versus chain D from PDB ID 1vr4. The two chains are 100% identical in sequence

A: Alignment by sequence

B: Alignment by structure

C: Overlaid structures

Ref:

Kosloff

, M.;

Kolodny, R. Sequence-Similar, Structure-Dissimilar Protein Pairs in the PDB. Proteins 2008, 71, 891

Slide30

Improving sequence alignments

Adding structural information to sequence alignments can improve

their quality

Slide31

Summary

This lecture should provide an overview of:

DNA sequencing and the Polymerase Chain Reaction

Genome sequencing

BLAST searching

Sequence alignments and their limitations