/
CS  4233&5263 Bioinformatics CS  4233&5263 Bioinformatics

CS 4233&5263 Bioinformatics - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
348 views
Uploaded On 2018-10-29

CS 4233&5263 Bioinformatics - PPT Presentation

Nextgeneration sequencing technology Outline First generation sequencing Next generation sequencing current AKA Second generation sequencing Massively parallel sequencing Ultra highthroughput sequencing ID: 702164

generation sequencing dna 2008 sequencing generation 2008 dna shendure reads nature quality amp 1135 1145 biotechnology jay hanlee noise

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "CS 4233&5263 Bioinformatics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

CS 4233&5263 Bioinformatics

Next-generation sequencing technologySlide2

OutlineFirst generation sequencing

Next generation sequencing (current)

AKA:

Second generation sequencing

Massively parallel sequencing

Ultra high-throughput sequencing

Future generation sequencing

Analysis challengesSlide3

Sanger sequencing (1st generation)

DNA is fragmented

Cloned to a plasmid vector

Cyclic sequencing reaction

Separation by electrophoresis

Readout with fluorescent tags

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)Slide4

Cyclic-array methods (next-generation)DNA is fragmented

Adaptors ligated to fragments

Several possible protocols yield array of PCR colonies.

Enyzmatic extension with fluorescently tagged nucleotides.

Cyclic readout by imaging the array.

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)Slide5

Available next-generation sequencing platformsIllumina/Solexa

ABI SOLiD

Roche 454

Polonator

HeliScope

…Slide6

Emulsion PCR

Rothberg and Leomon Nat Biotechnol. 2008

Shendure and Ji Nat Biotechnol. 2008

Fragments, with adaptors, are PCR amplified within a water drop in oil.

One primer is attached to the surface of a bead.

Used by 454, Polonator and SOLiD.Slide7

Stats:

(2009 data)

read lengths 200-300 bp

accuracy problem with homopolymers

400,000 reads per run

costs $60 per megabase

Rothberg and Leomon Nat Biotechnol. 2008

454 SequencingSlide8

Bridge PCRDNA fragments are flanked with adaptors.

A flat surface coated with two types of primers, corresponding to the adaptors.

Amplification proceeds in cycles, with one end of each bridge tethered to the surface.

Used by illumina/Solexa.Slide9

http://www.illumina.com/pages.ilmn?ID=203Slide10

First Round

All 4 labeled nucleotides

Primers

PolymeraseSlide11

1. Take image of first cycle

2. Remove fluorophore

3. Remove block on 3’ terminusSlide12
Slide13

http://seq.molbiol.ru/

Stats:

(2009 data)

read lengths up to 36 bp

error rates 1-1.5%

several million “spots” per lane (8 lanes)

cost $2 per megabaseSlide14

Conventional sequencingCan sequence up to 1,000 bp, and per-base 'raw' accuracies as high as 99.999%. In the context of high-throughput shotgun genomic sequencing, Sanger sequencing costs on the order of $0.50 per kilobase.

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)Slide15

Sequencing cost (2014)

http://www.genome.gov/sequencingcosts/Slide16

Sequence qualitiesIn most cases, the quality is poorest toward the ends, with a region of high quality in the middle

Uses of sequence qualities

‘Trimming’ of reads

Removal of low quality ends

Consensus calling in sequence assembly

Confidence metric for variant discovery

In general, newer approaches produce larger amounts of sequences that are shorter and of lower per-base quality

Next-generation sequencing has error rate around 1% or higherSlide17

Phred Quality Score

p=error probability for the base

if p=0.01 (1% chance of error), then q=20

p = 0.00001, (99.999% accuracy), q = 50

Phred quality values are rounded to the nearest integerSlide18

Main Illumina noise factors

Schematic representation of main Illumina noise factors.

(a–d) A DNA cluster comprises identical DNA templates (colored boxes) that are attached to the flow cell. Nascent strands (black boxes) and DNA polymerase (black ovals) are depicted.

(a) In the ideal situation, after several cycles the signal (green arrows) is strong, coherent and corresponds to the interrogated position.

(b) Phasing noise introduces lagging (blue arrows) and leading (red arrow) nascent strands, which transmit a mixture of signals.

(c) Fading is attributed to loss of material that reduces the signal intensity (c).

(d) Changes in the fluorophore cross-talk cause misinterpretation of the received signal (blue arrows; d). For simplicity, the noise factors are presented separately from each other.

Erlich et al. Nature Methods 5: 679-682 (2008) Slide19

Comparison of existing methods

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)Slide20

Read length and pairing

Short reads are problematic, because short sequences do not map uniquely to the genome.

Solution #1: Get longer reads.

Solution #2: Get paired reads.

ACTTAAGGCTGACTAGC

TCGTACCGATATGCTGSlide21

Third generationSingle-molecule sequencing

no DNA amplification is involved

Helicos HeliScope

Pacific Biosciences SMRT

Longer reads

Roche/454 > 400bpIllumina/Solexa > 100bpPacific Bioscience > 1000 bp and single moleculeSlide22

Applications of next-generation sequencing

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)Slide23

Analysis tasksBase calling

Mapping to a reference genome

De novo

or assisted genome assemblySlide24

ReferencesNext-generation DNA sequencing, Shendure and Ji, Nat Biotechnol. 2008.

Next-Generation DNA Sequencing Methods, Elaine R. Mardis, Annu. Rev. Genomics Hum. Genet. (2008) 9:387–402