/
Tuning Tophat2 Tuning Tophat2

Tuning Tophat2 - PowerPoint Presentation

conchita-marotz
conchita-marotz . @conchita-marotz
Follow
375 views
Uploaded On 2017-06-05

Tuning Tophat2 - PPT Presentation

Belinda Giardine Tophat2 Aligns reads from RNA to the genome Ribonucleic acid RNA is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding decoding regulation and expression of genes ID: 556220

amp worker threads std worker amp std threads segment juncs reads partner offsets seg vector run left 135 read

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Tuning Tophat2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Tuning Tophat2

Belinda GiardineSlide2

Tophat2

Aligns reads from RNA to the genome

Ribonucleic acid (RNA) is a ubiquitous family of large biological molecules that perform multiple vital roles in the coding, decoding, regulation, and expression of genes.

Adds on dealing with gaps in the alignments by breaking the reads into small pieces ~20 bases and reassembling the reads after mapping.

Though the new version is more parallel still slow (more than 4 days for recent runs)

It uses Bowtie to do the actual mapping Slide3

RNA-

seq

image from

wikipedia

fastq

file, a single read:

@DGM97JN1:330:C3EW0ACXX:1:1101:2723:1993 1:N:0:NAAGGC

GAATGCCCCCGGCCGTCCCTCTTAATCATGGCCTCAGTTCCGAAAACCANCAAAATAGAACCGCGGTCCTATTNN

+

CCCFFFFFHHHHGIIFGIIIIJJIIJIFGIJEHIIJIGHIJHAGHHFEE#,;;BACEEDDDDDD@B>BBDCDC##Slide4

Tophat2

Pipeline written in C++ (34,351 lines of code in 63 files)

Wrapper written in Python

3 of the programs use Boost

pthreads

long_spanning_reads.cpp

segment_juncs.cpp

tophat_reports.cpp

Programs are compiled as one unit under autoconfig and

automake

, communication between programs with temporary files.

Many prerequisites: zlib, Boost, samtools, Bowtie, this and the amount of file IO makes running on MIC only not feasible.Slide5

Data files

Reads in

fastq

format, 20–200 million reads (2

x

20gb for my test)

Reference sequence and indexes used for mapping 6gb for mouseFinal output 14gb for my testSlide6

Work from last time

Compiling

start with gcc

then icc

then add –mmic (this failed in trying to get all the prerequisites)

Test run on host, using

Tophat’s log of run for time.Run on biostar(Xeon) using 8 threads took 26 hours

Run on stampede (host) using 16 threads took 19 hours, 40

mins

Run on stampede (host) using 32 threads took 24 hoursSlide7

New work

Python wrapper and long run times makes

gprof

and

vtune

difficult to profile code with.

Going from my experience in Biostar, I am starting with

segment_juncs

executable.

Keeping the temporary files that are used for passing data between programs, I ran just

segment_juncs

.

Time for

segment_junctions

run alone:

8 threads 2 hours 13 minutes

16 threads 1 hour 15 minutes (2 ½ out of 19 ½ hours total)

of this 76% is spent in the parallel section

32 threads 2 hours 12 minutes Slide8

Failed attempts

Run

vtune

on

segment_juncs

times out of full datalicense errorsCheck

loops in

par_report

that are assumed dependencies.

lines of code indicated not loops or in loops?

contradictory

lines

Offloading

threaded section of code in

segment_juncs.cpp

. Will it actually improve speed or too much file IO

?

Lots of variables to copy

File IOSlide9

Hardison LabSlide10

vec_report3

segment_juncs

.cpp(135): (col. 32) remark: loop was not

vectorized

: existence of vector dependence.

segment_juncs

.cpp(135): (col. 32) remark: vector dependence: assumed ANTI dependence between r.92068 line 135 and r.92068 line 135.segment_juncs.cpp(135): (col. 32) remark: vector dependence: assumed FLOW dependence between r.92068 line 135 and r.92068 line 135.

Line 135:

left_seg

.left

= max(0,

T.right

() - 2);Slide11

opt_report

REMOVED VAR left_mismatches.201433.0_V$78b

REMOVED PACK left_mismatches.201433.0

REMOVED VAR right_mismatches.201433.0_V$78d

REMOVED PACK right_mismatches.

201433.0Slide12

gprof output for

segment_juncs

Each sample counts as 0.01 seconds.

% cumulative self self total

time seconds seconds calls Ts/call Ts/call name

100.01 0.01 0.01

extend_from_seeds(std::vector<

SeedExtension

,

std::allocator

<

SeedExtension

> >&,

PackedSplice

const&,

std::vector

<

std::vector

<

ReadHit

,

std::allocator

<

ReadHit

> >,

std::allocator

<

std::vector

<

ReadHit

,

std::allocator

<

ReadHit

> > > > const&,

std::string

const&,

std::string

const&, unsigned long, unsigned long,

int

)

0.00 0.01 0.00 89528 0.00 0.00

pack_splice(std::string

const&,

int

,

int

, unsigned

int

)

0.00 0.01 0.00 3 0.00 0.00 __

do_global_dtors_aux

0.00 0.01 0.00 2 0.00 0.00

pack_right_splice_half(std::string

const&, unsigned

int

, unsigned

int

)Slide13

Parallel section of code:

vector<

boost::thread

*> threads;

for (

int

i

= 0;

i

<

num_threads

; ++

i

)

{

SegmentSearchWorker

worker;

worker.rt

= &

rt

;

worker.reads_fname

=

left_reads_fname

;

worker.segmap_fnames

= &

left_segmap_fnames

;

worker.partner_reads_map_fname

=

right_reads_map_fname

;

worker.seg_partner_reads_map_fname

=

right_seg_fname_for_segment_search

;

worker.juncs

= &

vseg_juncs[i

];

worker.deletions

= &

vdeletions[i

];

worker.insertions

= &

vinsertions[i

];

worker.fusions

= &

vfusions[i

];

worker.read

= READ_LEFT;

worker.partner_hit_offset

= 0;

worker.seg_partner_hit_offset

= 0;

if (

i

== 0)

{

worker.begin_id

= 0;

worker.seg_offsets

= vector<int64_t>(

left_segmap_fnames.size

(), 0);

worker.read_offset

= 0;

}

else

{

worker.begin_id

= read_ids[i-1];

worker.seg_offsets.insert(worker.seg_offsets.end

(), offsets[i-1].begin()+1, offsets[i-1].end());

worker.read_offset

= offsets[i-1][0];

if (

partner_offsets.size

() > 0)

worker.partner_hit_offset

= partner_offsets[i-1];

if (

seg_partner_offsets.size

() > 0)

worker.seg_partner_hit_offset

= seg_partner_offsets[i-1];

}

worker

.end_id

= (i+1 <

num_threads

) ?

read_ids[i

] :

std::numeric_limits

<uint64_t>::max();

//Geo debug:

//

fprintf(stderr

, "Worker %

d

:

begin_id

=%

lu

,

end_id

=%

lu\n

",

i

,

worker.begin_id

,

worker.end_id

);

if (

num_threads

> 1 &&

i

+ 1 <

num_threads

)

threads.push_back(new

boost::thread(worker

));

else

worker();

}