Okoniewski Samuel Fux Manuel Kohler 112217 1 High Performance Computing for genomic applications Using genomic software on Euler Michal Okoniewski Scientific IT ETH m odule load ID: 812162
Download The PPT/PDF document "Scientific IT Services Michal" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Scientific IT ServicesMichal Okoniewski, Samuel Fux, Manuel Kohler
11/22/17
1
High Performance Computing for genomic applicationsUsing genomic software on Euler
Michal Okoniewski, Scientific IT ETH
Slide2module loadmodule load gdcmodule avail
module purge
11/22/17
Michal Okoniewski, Scientific IT ETH 2Bioinformatic modules on EULER
Slide3Data processing and convertion: samtools, picard,
…Aligners: bwa,
bowtie, SHRiMP, …
RNA aligners: STAR, tophat, subjuncTranscriptome aligners: kallistio
, sailfish, RSEM, …
Old
-style
aligners
: Blast, Blat, VMATCH
De-
novo
assemblers
: trinity, velvet, spades,…Feature extraction, counting: HTSeq, featureCountTranscript discovery: cufflinksSpecialized tools: MISO, blast2go…
11/22/17
Michal Okoniewski, Scientific IT ETH
3
Bioinformatic
software jungle - categories
Slide411/22/17Michal Okoniewski, Scientific IT ETH
4
Bioinformatic software jungle
other
MISO/MATS
jsplice
cufflinks/
cuffdiff
SparkSeq
tests
DESeq
/
edgeR
DEXSeq
BAM
BAM
BAM
BAM
fastq
fastq
fastq
fastq
STAR
aligner
fastqc
Fastqc
report
ht_seq
SparkSeq
counts
SparkSeq
junctions
junctions
junctions
junctions
junctions
RPKM
tables
Count table
Count tables
(gene, exon…)
RPKM
tables
RSEM/
Bitseq
isoform
deconvolute
tophat
aligner
Experiment
definition
REPORTING PANEL
Selection of
graphs
Report types
output formats (BED CSV..)
Setting thresholds
Functional analysis
parametres
David
String DB
GeneGo
(commercial)
Ingenuity
(commercial)
Differential expression
report
Differential splicing
report
fastq
Functional analysis
tools
Statistical
analysis
tools
Genomic feature
extraction
- counting
Alignment
to the genome
CSV
BED
Reporting
Output
Genome
browser
Genome
browser
Genome
browser
Filters,
trimming
Filters,
trimming
unmapped
fastq
unmapped
fastq
unmapped
BAM
unmappedBAM
igenomes
GTF/GFF
genome
STARTING
PANEL
OptionsFor analysisMode:Step-by-stepSingle-run
Slide5Building genome indexbowtie2-build
--threads 24 /
cluster/home/michalo
/work_michalo/hg38/Homo_sapiens.GRCh38.dna.primary_assembly.fa /cluster/scratch/michalo/hg38/hg38
Alignment:
11/22/17
Michal Okoniewski, Scientific IT ETH
5
Bowtie, bowtie2
Slide6Classic splice-aware alignerUses bowtie2 as engine, so also bowtie2 index t
ophat -p
24 -o tophat_out
--library-type fr-firststrand ~/work_michalo
/hg38/hg38 mini.fastq.gz
Manual: http://
ccb.jhu.edu
/software/
tophat
/
index.shtml
11/22/17
Michal Okoniewski, Scientific IT ETH
6Tophat
Slide711/22/17Michal Okoniewski, Scientific IT ETH
7
“Tuxedo suite”
Slide8Transcript discovery toolUses coverage and junctions from a BAM file
cufflinks mini_star.sorted.bam
Other cuffmerge
, cuffdiff, cuffquant, cuffnorm, CummeRbund11/22/17
Michal Okoniewski, Scientific IT ETH
8
Cufflinks
Slide9Produces GTF
11/22/17
Michal Okoniewski, Scientific IT ETH
9Cufflinks
Slide10Splice aware aligner, loading index into memoryResults similar to tophat, but faster--genomeLoad
LoadAndKeepWith specific options, can produce BAM and do the counting too
https://github.com/alexdobin/STAR/blob/master/doc/
STARmanual.pdf11/22/17Michal Okoniewski, Scientific IT ETH
10
STAR
Slide1111/22/17
Michal Okoniewski, Scientific IT ETH
11
STAR
Slide12Includes subjunc similar to STAR and featureCounts Building index
subread-buildindex -o /cluster/home/
michalo/
work_michalo/hg38/subread_index/hg38 /
cluster/home/michalo
/
work_michalo
/hg38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
Alignment
subread
-T 24 -
i
/cluster/home/michalo/work_michalo/hg38/subread_index/hg38 -r
mini.fastq
-o mapped_reads_subjunc
/mini.bam
subjunc
-T 24 -
i
/cluster/home/
michalo
/
work_michalo
/hg38/
subread_index
/hg38 -r
mini.fastq
-o
mapped_reads_subjunc
/
mini.bam
http
://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf
11/22/17
Michal Okoniewski, Scientific IT ETH 12
subread
Slide13General purpose tool for conversion of BAM SAM Many other operations: pileup…
11/22/17
Michal Okoniewski, Scientific IT ETH
13samtools
Slide14Fast and flexible counting in genomic featuresfeatureCounts
-M -s 2 -T 24 -t gene -g gene_id
-a /cluster/home/michalo
/work_michalo/hg38/Homo_sapiens.GRCh38.86.chr.gtf -o mini.cnt
mini_star.sorted.bam
Important options:
11/22/17
Michal Okoniewski, Scientific IT ETH
14
featureCounts
Slide1511/22/17Michal Okoniewski, Scientific IT ETH
15
featureCounts
Slide16GATK is a genomic toolbox for various operations related mainly to genomic variants calling Operations include producing a variant file *.vcf
from an alignment file *.bam module load
gcc/4.8.2
gdc java/1.8.0_73 gatk/3.5java
-jar GenomeAnalysisTK.jar
-T
UnifiedGenotyper
-
R ref/human_g1k_b37_20.fasta
-
I
bams
/exp_design/NA12878_wgs_20.bam -o sandbox/NA12878_wgs_20_UG_calls.vcf -
glm BOTH
-L
20:10,000,000-10,200,000https://software.broadinstitute.org/gatk/documentation/tooldocs/current/
https://software.broadinstitute.org/gatk/documentation/topic?name=tutorials
http://gatkforums.broadinstitute.org/gatk/discussion/7869/howto-discover-variants-with-gatk-a-gatk-workshop-tutorial
11/22/17
Michal Okoniewski, Scientific IT ETH
16
GATK
Slide17Using
genomic software on Euler
Thank you!