Sept 3014 De novo genome assembly using Velvet using Perl to execute system commands Velvet Velvet was one of the first widely used de novo assembly programs http wwwebiacuk zerbinovelvetManualpdf ID: 254300
Download Presentation The PPT/PDF document "MCB3895-004 Lecture #11" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
MCB3895-004 Lecture #11Sept 30/14
De novo
genome assembly using Velvet; using Perl to execute system commandsSlide2
Velvet Velvet was one of the first widely used de novo assembly programs
http
://www.ebi.ac.uk/~
zerbino/velvet/Manual.pdf
It is still quite common, although frequently supersededSlide3
Velvet step #1: De brujin graph construction
http://en.wikipedia.org/wiki/Velvet_assembler
Red bases
= sequencing errorSlide4
Velvet step #2: graph simplification
Unbranched paths collapsed into single nodes
http://en.wikipedia.org/wiki/Velvet_assemblerSlide5
Velvet step #3: tip removal
Dead end paths removed
http://en.wikipedia.org/wiki/Velvet_assemblerSlide6
Velvet step #4: bubble popping
"Bubbles" caused by sequencing error resolved
http://en.wikipedia.org/wiki/Velvet_assemblerSlide7
Running velvet: single end
Step #1:
velveth
to set up database
$
velveth
SRR826450_1 21 -short -fastq
SRR826450_1.fastqParameters for directory name, kmer
, read type, input file name
Step #2:
velvetg
to make graph
$
velvetg
SRR826450_1 -
exp_cov
auto
Parameters for directory name, automatic coverage detection
Note: the max
kmer
size is set during software compilation. In our case the max is
kmer
=61
Note: only odd
kmer
numbers are allowedSlide8
Running velvet: paired end
In a fit of cussedness, velvet requires its paired-end data in a single interleaved file
Use
shuffleSequences_fastq.pl
Copy from
/
opt/bioinformatics2/velvet_1.2.10_31kmer/contrib/
shuffleSequences_fastq/ into your directory
Running velvet is otherwise similar
$
velveth
SRR826450
21 -
shortPaired
-
fastq
SRR826450_paired.fastq
$
velvetg
SRR826450
-
exp_cov
autoSlide9
Velvet outputcontigs.fa
- contains output
contigs
If paired end,
contigs
are joined into scaffolds using strings of Ns
Log - contains what commands have actually been run along with parameters, results summarySlide10
Today's assignment part 1Compare a single-end and paired-end assembly of SRR826450
Use a small
kmer
(e.g., 21) to keep things computationally simple for nowSlide11
Running commands using Perl
Perl can be used to run terminal commands, just like Bash
Use the
system
command to run whatever follows as a terminal command
e.g., system
"
ls
-l > list"
;
# lists all of the files in the
# directory where the script was
# run and stores this information
# in a new file "list"Slide12
Using system to run repetitive analyses
A useful way to use this sort of command is in a loop to run the same program multiple times using slightly different parameters
e.g.,
for
(
my
$kmer
=
15;
$
kmer
<=
31;
$
kmer
=
$
kmer
+
2){
print
"
kmer
=
$
kmer
\n
"
;
system
"
velveth
infile
_
$
kmer
$
kmer
-
shortPaired
-
fastq
infile
"
;
system
"
velvetg
infile
_
$
kmer
-
exp_cov
auto"
;
}Slide13
Hints for system loopsUse
print statements can be helpful to keep track of
loop
Run the loop first with all system commands converted to print statements
Run only a single iteration of the entire loopSlide14
Today's assignment part 2Answer the question: what is the optimal
kmer
size to assemble the SRR826450 paired end reads?
For Thursday's class, read and be ready to discuss:
Magoc
et al. 2013 Bioinformatics 29:1718-1725
Bradnam et al. GigaScience 2:10