/
MCB3895-004 Lecture #11 MCB3895-004 Lecture #11

MCB3895-004 Lecture #11 - PowerPoint Presentation

test
test . @test
Follow
362 views
Uploaded On 2016-03-13

MCB3895-004 Lecture #11 - PPT Presentation

Sept 3014 De novo genome assembly using Velvet using Perl to execute system commands Velvet Velvet was one of the first widely used de novo assembly programs http wwwebiacuk zerbinovelvetManualpdf ID: 254300

kmer velvet system srr826450 velvet kmer srr826450 system run fastq paired step commands single http velveth running parameters loop

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "MCB3895-004 Lecture #11" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

MCB3895-004 Lecture #11Sept 30/14

De novo

genome assembly using Velvet; using Perl to execute system commandsSlide2

Velvet Velvet was one of the first widely used de novo assembly programs

http

://www.ebi.ac.uk/~

zerbino/velvet/Manual.pdf

It is still quite common, although frequently supersededSlide3

Velvet step #1: De brujin graph construction

http://en.wikipedia.org/wiki/Velvet_assembler

Red bases

= sequencing errorSlide4

Velvet step #2: graph simplification

Unbranched paths collapsed into single nodes

http://en.wikipedia.org/wiki/Velvet_assemblerSlide5

Velvet step #3: tip removal

Dead end paths removed

http://en.wikipedia.org/wiki/Velvet_assemblerSlide6

Velvet step #4: bubble popping

"Bubbles" caused by sequencing error resolved

http://en.wikipedia.org/wiki/Velvet_assemblerSlide7

Running velvet: single end

Step #1:

velveth

to set up database

$

velveth

SRR826450_1 21 -short -fastq

SRR826450_1.fastqParameters for directory name, kmer

, read type, input file name

Step #2:

velvetg

to make graph

$

velvetg

SRR826450_1 -

exp_cov

auto

Parameters for directory name, automatic coverage detection

Note: the max

kmer

size is set during software compilation. In our case the max is

kmer

=61

Note: only odd

kmer

numbers are allowedSlide8

Running velvet: paired end

In a fit of cussedness, velvet requires its paired-end data in a single interleaved file

Use

shuffleSequences_fastq.pl

Copy from

/

opt/bioinformatics2/velvet_1.2.10_31kmer/contrib/

shuffleSequences_fastq/ into your directory

Running velvet is otherwise similar

$

velveth

SRR826450

21 -

shortPaired

-

fastq

SRR826450_paired.fastq

$

velvetg

SRR826450

-

exp_cov

autoSlide9

Velvet outputcontigs.fa

- contains output

contigs

If paired end,

contigs

are joined into scaffolds using strings of Ns

Log - contains what commands have actually been run along with parameters, results summarySlide10

Today's assignment part 1Compare a single-end and paired-end assembly of SRR826450

Use a small

kmer

(e.g., 21) to keep things computationally simple for nowSlide11

Running commands using Perl

Perl can be used to run terminal commands, just like Bash

Use the

system

command to run whatever follows as a terminal command

e.g., system

"

ls

-l > list"

;

# lists all of the files in the

# directory where the script was

# run and stores this information

# in a new file "list"Slide12

Using system to run repetitive analyses

A useful way to use this sort of command is in a loop to run the same program multiple times using slightly different parameters

e.g.,

for

(

my

$kmer

=

15;

$

kmer

<=

31;

$

kmer

=

$

kmer

+

2){

print

"

kmer

=

$

kmer

\n

"

;

system

"

velveth

infile

_

$

kmer

$

kmer

-

shortPaired

-

fastq

infile

"

;

system

"

velvetg

infile

_

$

kmer

-

exp_cov

auto"

;

}Slide13

Hints for system loopsUse

print statements can be helpful to keep track of

loop

Run the loop first with all system commands converted to print statements

Run only a single iteration of the entire loopSlide14

Today's assignment part 2Answer the question: what is the optimal

kmer

size to assemble the SRR826450 paired end reads?

For Thursday's class, read and be ready to discuss:

Magoc

et al. 2013 Bioinformatics 29:1718-1725

Bradnam et al. GigaScience 2:10