/
Last lecture summary Last lecture summary

Last lecture summary - PowerPoint Presentation

briana-ranney
briana-ranney . @briana-ranney
Follow
371 views
Uploaded On 2016-04-26

Last lecture summary - PPT Presentation

Sequencing strategies Hierarchical genome shotgun HGS Human Genome Project map first sequence second clonebyclone cloning is performed twice BAC plasmid Sequencing strategies ID: 294330

sequences sequence human genome sequence sequences genome human alignment genomes dna generation sequencing http gene pmid www evolution function

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Last lecture summary" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Last lecture summarySlide2

Sequencing strategies

Hierarchical genome shotgun HGS – Human Genome Project

“map first, sequence second”clone-by-clone … cloning is performed twice (BAC, plasmid)Slide3

Sequencing strategies

Whole genome shotgun WGS – Celera

shotgun, no mapping

Coverage

- the average number of reads representing a given nucleotide in the reconstructed sequence. HGS: 8, WGS: 20Slide4

Human genome

3

billions bps, ~20 000 – 25 000 genesOnly 1.1 – 1.4 % of the genome sequence codes for proteins.

State of completion:

best estimate – 92.3% is complete

problematic unfinished regions: centromeres, telomeres (both contain highly repetitive sequences), some unclosed gapsIt is likely that the centromeres and telomeres will remain unsequenced until new technology is developed

Genome is stored in databases

Primary database – Genebank

(

http://www.ncbi.nlm.nih.gov/sites/entrez?db=nucleotide

)

Additional

data and

annotation,

tools for visualizing and

searching

UCSCS (

http://genome.ucsc.edu

)

Ensembl (

http://www.ensembl.org

)Slide5

New stuffSlide6

Personal human genomes

Personal genomes had not been sequenced in the Human Genome Project to protect the identity of volunteers who provided DNA samples.

Following personal genomes were available by July 2011:

Japanese male (2010, PMID: 20972442

)

Korean male (2009, PMID: 19470904)Chinese male (2008, PMID: 18987735)Nigerian male (2008, PMID: 18987734)J. D. Watson (2008, PMID: 18421352)J. C. Venter (2007, PMID: 17803354

)

HGP sequence is haploid, however, the sequence maps

of

Venter and Watson

are diploid.Slide7

Next generation sequencing (NGS)

The completion of human genome was just a start of modern DNA sequencing era – “high-throughput next generation sequencing” (NGS).

New approaches, reduce time and cost.Holly Grail of sequencing – complete human genome below $ 1000.Slide8

1

st

and 2nd generation of sequencers

1

st

generation – ABI Prism 3700 (Sanger, fluorescence, 96 capillaries), used in HGP and in CeleraSanger method overcomes NGS by the read length (600 bps)2nd generation - birth of HT-NGS in 2005. 454 Life Sciences developed GS 20 sequencer. Combines PCR with

pyrosequencing

.

Pyrosequencing

sequencing-by-synthesis

Relies

on detection of pyrophosphate release on nucleotide incorporation rather than chain termination with

ddNTs

.

The release of pyrophosphate is detected by flash of light (

chemiluminiscence

).

Average read length: 400

bp

Roche GS-FLX 454 (successor of GS 20) used for J. Watson’s genome sequencing.Slide9

3

rd

generation2nd generation still uses PCR amplification which may introduce base sequence errors or favor certain sequences over others.

To overcome this, emerging 3

rd

generation of seqeuencers performs the single molecule sequencing (i.e. sequence is determined directly from one DNA molecule, no amplification or cloning).Compared to 2nd generation these instruments offer higher throughput, longer reads (~1000 bps), higher accuracy, small amount of starting material, lower costSlide10

Moore’s lawSlide11

source

: http://www.genome.gov/27541954

transition to 2

nd

generation

4,905 $

0.054$

5,000$Slide12

Illumina

HiSeq X Ten

14. 1. 2014 Illumina anounced the new HiSeq X Ten Sequencing System.Illumina claims they are enabling the $1,000

genome.

Uses Illumina SBS technology (sequencing-by-synthesis).

It sells for at least $10 million.Slide13

Human Longevity

4. 3. 2014 – Human Longevity was founded by Craig Venter

Its main aim: to slow down the process of ageingThe largest human DNA sequencing operation in the world, capable of processing 40,000 human genomes a

year.

DNA

data will be combined with other data on the health and body composition of the people whose DNA is sequenced, in the hope of gleaning insights into the molecular causes of aging and age-related illnesses like cancer and heart disease.Equipment: 2x Illumina

Hiseq

X TenSlide14

Which genomes were sequenced?

http

://www.ncbi.nlm.nih.gov/sites/genomeGOLD –

Genomes online database

(http://www.genomesonline.org/)information regarding complete and ongoing genome projectsSlide15

Important genomics projects

The analysis of personal genomes has demonstrated, how difficult is to draw medically or biologically relevant conclusions from individual sequences.

More genomes need to be sequenced to learn how genotype correlates with phenotype.

1000

Genomes

project (http://www.1000genomes.org/) started in 2009. Sequence the genomes of at least a 1000 people from around the world to create the detailed and medically useful picture of human genetic

variation.

2

nd

generation of sequencers is used in 1000 Genomes.

10 000 Genomes will start soon.Slide16

Important genomics projects

ENCODE project (ENCyclopedia Of DNA Elements,

http://www.genome.gov/ENCODE/)by NHGRI

identify all functional elements in the human genome sequence

Defined regions of the human genome corresponding to 30Mb (1%) have been selected.

These regions serve as the foundation on which to test and evaluate the effectiveness and efficiency of a diverse set of methods and technologies for finding various functional elements in human DNA.Slide17

Sequence AlignmentSlide18

What is sequence alignment ?

CTTTTCAAGGCTTA

GGCTTATTATTGC

CTTTTCAAGGCTTA

GGCTATTATTGC

CTTTTCAAGGCTTA

GGCT-ATTATTGC

Fragment overlapsSlide19

What is sequence alignment ?

CCCCATGGTGGCGGCAGGTGACAG

CATGGGGGAGGATGGGGACAGTCCGG TTACCCCATGGTGGCGGCTTGGGAAACTT

TGGCGGCTCGGGACAGTCGCGCATAAT

CCATGGTGGTGGCTGGGGATAGTA

TGAGGCAGTCGCGCATAATTCCG

TTACCCCATGGTGGCGGCTGGGGACAGTCGCGCATAATTCCG

CCCCATGGTGGCGGCAGGTGACAG

CATGGGGGAGGATGGGGACAGTCCGG

TTACCCCATGGTGGCGGCTTGGGAAACTT

TGGCGGCTCGGGACAGTCGCGCATAAT

CCATGGTGGTGGCTGGGGATAGTA

TGAGGCAGTCGCGCATAATTCCG

consensusSlide20

Sequence alphabet

side chain charge at physiological pH 7.4

Name

3 letters

1 letter

Positively charged side chains

Arginine

Arg

R

Histidine

His

H

Lysine

Lys

K

Negatively charged side chains

Aspartic Acid

Asp

D

Glutamic Acid

Glu

E

Polar uncharged side chains

Serine

Ser

S

Threonine

Thr

T

Asparagine

Asn

N

Glutamine

Gln

Q

Special

Cysteine

Cys

C

Selenocysteine

Sec

U

Glycine

Gly

G

Proline\

Pro

P

Hydrophobic side chains

Alanine

Ala

A

Leucine

Leu

L

Isoleucine

Ile

IMethionineMetMPhenylalaninePheFTryptophanTrpWTyrosineTyrYValineValV

Adenine

A

Thymine

T

Cytosine

G

Guanine

CSlide21

Sequence alignment

Procedure

of comparing sequencesPoint mutations – easyMore difficult example

However,

gaps

can be inserted to get something like this

ACGTCTGAT

A

CGCC

G

TAT

A

GTCTATCT

ACGTCTGAT

T

CGCC

C

TAT

C

GTCTATCT

AC

G

TC

T

GAT

A

CGCCG

TAT

AGTCTATCT

CT

G

AT

T

CGC

A

TCGTC

TAT

CT

ACGTCTGAT

A

CGCCGTAT

A

GTCTATCT

----CTGAT

T

CGC---AT

C

GTCTATCT

gapless alignment

gapped alignment

insertion × deletion

indel Slide22

Why align sequences – continuation

The draft human genome is available

Automated gene finding is possibleGene: AGTACGTATCGTATAGCGTAA

What does it do?

One approach: Is there a similar gene in another species?

Align sequences with known genesFind the gene with the “best” matchSlide23

Flavors of sequence alignment

pair-wise alignment

× multiple sequence alignmentSlide24

Flavors of sequence alignment

global alignment

× local alignment

global

local

align entire sequence

stretches of sequence with the highest density of matches

are aligned

,

generating islands

of matches or subalignments in the

aligned sequencesSlide25

Evolution

wikipedia.org

common ancestorsSlide26

Evolution of sequences

The sequences are the products of molecular evolution.

When sequences share a common ancestor, they tend to exhibit similarity in their sequences, structures and biological functions.

Similar function

Sequence similarity

Similar 3D structure

Protein1

Protein2

DNA1

DNA2

However, this statement is not a

rule. See

Gerlt JA, Babbitt PC. Can sequence determine function?

Genome

Bio

l.

2000

;1(5) PMID: 11178260

Similar sequences produce similar proteinsSlide27

Homology

During

the time period, the molecular sequences undergo random changes, some of which are selected during the process of evolution.Selected sequences accumulate mutations, they diverge over time

.

Two sequences are

homologous when they are descended from a common ancestor sequence.Traces of evolution may still remain in certain portions of the sequences to allow identification of the common ancestry.Residues performing key roles are preserved by natural selection, less crucial residues mutate more frequently

.Slide28

Orhology, paralogy I

Orthologs

– homologous proteins from different species that possess the same function (e.g. corresponding kinases in signal transduction pathway in humans and mice)

Paralogs

– homologous proteins that have different function in the same species (e.g. two kinases in different signal transduction pathways of humans)

However, these terms are controversially discussed:Jensen RA. Orthologs and paralogs - we need to get it right. Genome Biol. 2001;2(8), PMID: 11532207 and references thereinSlide29

Orthology,

paralogy II

Orthologs – genes separated by the

event

of

speciationSequences are direct descendants of a common ancestor.Most likely have similar domain structure, 3D structure and biological function.Paralogs

– genes

separated by the event of genetic

duplication

Gene duplication

: An extra copy of a gene. Gene duplication is a key mechanism in evolution. Once a gene is duplicated, the identical genes can undergo changes and diverge to create two different genes

.

http://www.globalchange.umich.edu/globalchange1/current/lectures/speciation/speciation.htmlSlide30

Gene duplication

Unequal cross-over

Entire chromosome is replicated twiceThis error will result in one of the daughter cells having an extra copy of the

chromosome.

If this cell fuses with another cell during reproduction, it may or may not result in a viable zygote.

RetrotranspositionSequences of DNA are copied to RNA and then back to DNA instead of being translated into proteins resulting in extra copies of DNA being present within cell.Slide31

Unequal cross-over

Homologous chromosomes are misaligned during meiosis.

The probability

of

misalignment is a function of the degree of sharing

the

repetitive elements.Slide32

Comparing sequences through alignment – patterns of conservation and variation can be identified.

The degree of sequence

conservation in the alignment reveals evolutionary relatedness of different sequencesThe variation between sequences reflects the changes that have occurred during evolution in the form of substitutions and/or indels.

Identifying the evolutionary relationships between sequences helps to characterize the function of unknown sequences.

Protein sequence comparison can identify homologous sequences from common ancestor 1 billions year ago (BYA). DNA sequences typically only 600 MYA.