Finding Deletions with Exact Break Points from Noisy Low Co

Finding Deletions with Exact Break Points from Noisy Low Co Finding Deletions with Exact Break Points from Noisy Low Co - Start

2016-02-24 59K 59 0 0

Download Presentation

Finding Deletions with Exact Break Points from Noisy Low Co




Download Presentation - The PPT/PDF document "Finding Deletions with Exact Break Point..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentations text content in Finding Deletions with Exact Break Points from Noisy Low Co

Slide1

Finding Deletions with Exact Break Points from Noisy Low Coverage Paired-end Short Sequence Reads

Jin Zhang

and

Yufeng

Wu

Department of Computer Science and Engineering

University of Connecticut

Slide2

Introduction

Reference

Alternative

deletion

insertion

Structural variants

low coverage (2-6x); Illumina (2009_08); 45 individuals in CEU population;(combine)580 G in BAM format;

Only one end mapped

24G

in fastq format250 Million reads

Abnormal

insertionsize

Sequence

error

& other errors(more 99%)

Real Split-reads(also contain error)

In region without Deletion

Deletions

with

exact

break point efficiently

Both ends mapped

Split-reads Mapping

Our problem: Finding Exact break points of deletions using low coverage noisy data efficiently

Data

( from 1000 genomes project pilot 1

)

Achieve

Abnormal insertion size

R

eference

Alternative

Deletion

Mean insertion size + 3

sds

anchor

deletion

Slide3

Method

reference

alternative

deletion

anchor

Split-read

Spanning pair (abnormal insertion size)

We map Split-read on Burrows-Wheeler Transform (BWT)

Inexact mapping

CACAAT

A

CCCTCTCACACCAACGT

T

ACG

Split-read

reference

CAAT CCCTC

ACGT

A

ACG

mismatch

indel

SVs near SNPs and

indels

can be found; Reads with errors can be used

reference

Ex. Hit at 2 positions

or

reference

Hits not unique

Split not unique

We pre-build local BWTs with length 102k (2k for overlap) on each strand.

Search on which BWT is decided by the anchor.

Search locally

reference

alternative

deletion

anchor

Split-read

Report the hits and splits with the best quality

efficiency

ex. search near region of 1Mb

Slide4

Method

Calling candidate deletions

TTAACCAT

TACGTTTAACCATACGGCCAAAACGTAACGT

ACGTAACGT

TTAACCATACG

TAACGT

TTAACCATACGTAACGT

or

(leftmost)

(rightmost

)

(1)Sorting split-reads leftmost break points

(2)cluster the split-reads supporting the same candidate

(3)call candidate

Cutoff

value:

at least how many split-reads support the candidate

Reference

Slide5

Method

Reference

candidate

Reference

candidate

Candidates

validation(calling deletions)

Has spanning pair

No spanning pair

(not enough information, we can’t validate)

low coverage may cause no spanning pair, there is a deletion.

maybe

caused by deletion

Alternative

deletion

Abnormal insertion size

If there

exists

spanning

pairs

with abnormal insertion size, we validate the deletion

Notice that the candidate is from split-reads mapping

Split-read is not mapped right or is with error, there is No deletion.

Slide6

Results

Benchmarks

The 1000 Genomes Project Consortium, (2010) A map of human genome variation from population-scale sequencing, Nature, 467, 1061-1073.

Mills,R.E., et al., (2011) Mapping copy number variation by population-scale genome sequencing, Nature 470, 5965.

Benchmark 1

Benchmark 2

We run our test with 1000 genomes project releases as benchmarks;The Deletions in these releases are found by multiple methods

Data

Low coverage (can be as low as 2x)Illumina (2009_08); 45 individuals in CEU population;(combine)580 G in BAM format;

Comparison

Pindel v1Exact matchMax Event Size 1MbPindel v2With mismatchMax Event Size 8092bp

Ye,K

., el al.,

(2009)

Pindel

: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads,

Bioinformatics

, 25, 2865-s2871

Slide7

Results

Vs. Pindel v1 (Max Event 1Mb)

Total number of called deletions

True positive with precise deletions in benchmark 1

What is the percentage of true deletions?We use the 1000 genomes project releases as benchmarks

True positive with precise deletions in benchmark 2

With the benchmark, our method with cutoff 2, has more true Positives and less false positives

There might be deletions not in the benchmarks but found by our method

Chromosomes

chromosomes

chromosomes

Chromosomes

Slide8

Results

Vs. Pindel v2 (Max Event 8092k)

Total number of deletions found

Comparison with v2 has the same trend with the comparison with v1

True positive with precise deletions in benchmark 2

Chromosomes

Chromosomes

True positive with

precise

deletions in benchmark 1

Slide9

Results

Inexact MatchThreadsHoursPindel v1Not allow1About 10Pindel v2Allow Mismatch2030 still runningOur MethodAllow Mismatch and indel1About 3.5

Data: 45 individuals on chromosome 1Running on our Xeon server with 24 CPUsFinding SV with Maximum Event Size include 1Mb

Running time

An example of inexact mapping(P1_M_061510_1 9_22)

Experiment is run on workstations supported by NSF grant IIS-0916948

Research partly supported by NSF grant IIS-0953563


About DocSlides
DocSlides allows users to easily upload and share presentations, PDF documents, and images.Share your documents with the world , watch,share and upload any time you want. How can you benefit from using DocSlides? DocSlides consists documents from individuals and organizations on topics ranging from technology and business to travel, health, and education. Find and search for what interests you, and learn from people and more. You can also download DocSlides to read or reference later.