/
Aligning Reads Aligning Reads

Aligning Reads - PowerPoint Presentation

marina-yarberry
marina-yarberry . @marina-yarberry
Follow
444 views
Uploaded On 2015-11-09

Aligning Reads - PPT Presentation

Ramesh Hariharan Strand Life Sciences IISc What is Read Alignment AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC Subjects Genome AGGCTACGCAT G TCCCATAA T GACCCAC A CTTAAGTTC Reference Genome ID: 187915

mismatches gaps bwa reference gaps mismatches reference bwa read genome index match mer reads length exact cobweb handling rnaseq

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Aligning Reads" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Aligning Reads

Ramesh

Hariharan

Strand Life Sciences

IIScSlide2

What is Read Alignment?Slide3

AGGCTACGCATTTCCCATAAAGACCCACGCTTAAGTTC

Subject’s Genome

AGGCTACGCAT

G

TCCCATAA

T

GACCCAC

A

CTTAAGTTC

Reference Genome

Where do these match in the Reference?

Close but not quite the same as the Subject’s Genome Slide4

What does “Match” mean?Slide5

AGGCTACGCAT

G

TCCCATAA

T

GACCCAC

A

CTTAAGTTC

Reference Genome

GCTACGCA

Exact Match

CATAA

A

GAC

With Mismatches

CACTT

_

AGT

With GapsSlide6

Why mismatches and gaps?Slide7

The subject genome could be different from the referenceSlide8

Reads

Reference Genome

SNP

Deletion

Mismatches and GapsSlide9

The reading process could be erroneousSlide10

How many mismatches and gaps?Slide11

Short reads ~50, few mismatches and gaps

Long reads, ~1000, many more mismatches and gapsSlide12

How do aligners fare?Slide13

BWA: Very few mismatches and gaps

CoBWeb

BWA-SW: Many mismatches and gaps

BowTie

: only mismatches, no gaps

No paired read handling

No handling of adaptor trimming for small RNA

Separate handling for

RNASeq

BowTie2Slide14

How does an Aligner work?Slide15

For simplicity, assume

Exact Match Slide16

For each read, scan the entire reference genome sequence

SLOW!!!!Slide17

C

G

A

C

G

The Reference

C

C

G

T

T

A

C

A

G

A

C

T

Index the ReferenceSlide18

How can we find Exact Matches of a read quickly with this index?Slide19

C

G

A

C

G

The Reference

C

C

G

T

T

A

C

A

G

A

C

T

C

G

CSlide20

The problem: 24GBSlide21

Can this structure be compressed?Slide22

C

G

A

C

$

A

C

$

C

G

C

G

A

C

$

C

$

C

G

A

G

A

C

$

C

$

C

G

A

C

The Reference

This column is the BWT

All its circular shifts, sorted lexicographically

The Index: now an array instead of a tree

The Burrows-Wheeler based Index

Sampled to reduce memory at the expense of speed

(

Ferragina

and

Manzini

)Slide23

How about Mismatches and Gaps?Slide24

BWA, BWA-SW and

BowTie

force mismatches and gaps into the BW Index searching procedureSlide25

CoBWeb

uses

the BW Index to find a ‘seed’ exact match and does Smith-Waterman around this seed

This 15-mer occurs at locations x1, x2…

This 15-mer occurs at locations x3, x4…

This whole 30-mer occurs at location x5Slide26

Dynamic Programming

Given a location in the reference with an read anchor, how well does the read match here?

Reference

Read

Anchor 14

mer

Smith-

Waterman (optimized

for large gaps)Slide27

Comparison with BWA

Read Length 50

Read Length 150

20% faster than BWA with comparable results

CoBWeb: 3 mismatches and 2 gaps

BWA: 2 mismatches + 1 gap of possibly multiple lengthSlide28

Comparison with BWA-SW

Read Length 400

8 mismatches plus 10 gaps

CoBWeb

BWA-SW

Reads

1m

1m

Time

taken

1130s

2242s

Incorrectly

Mapped

125989819

5650 mapped incorrecty by BWA-SW

The remainder has poor BWA mapping qualitySlide29

Avadis NGSSlide30

Avadis

NGS

Alignment, DNA

Var

Detection,

RNASeq

,

ChIPSeq

, Small

RNASeqSlide31

Thank You