/
Scientific IT Services Michal Scientific IT Services Michal

Scientific IT Services Michal - PowerPoint Presentation

bikersnomercy
bikersnomercy . @bikersnomercy
Follow
342 views
Uploaded On 2020-09-28

Scientific IT Services Michal - PPT Presentation

Okoniewski Samuel Fux Manuel Kohler 112217 1 High Performance Computing for genomic applications Using genomic software on Euler Michal Okoniewski Scientific IT ETH m odule load ID: 812162

scientific okoniewski bam eth okoniewski scientific eth bam michalo hg38 michal fastq gatk mini star cluster subread work software

Share:

Link:

Embed:

Download Presentation from below link

Download The PPT/PDF document "Scientific IT Services Michal" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Scientific IT ServicesMichal Okoniewski, Samuel Fux, Manuel Kohler

11/22/17

1

High Performance Computing for genomic applicationsUsing genomic software on Euler

Michal Okoniewski, Scientific IT ETH

Slide2

module loadmodule load gdcmodule avail

module purge

11/22/17

Michal Okoniewski, Scientific IT ETH 2Bioinformatic modules on EULER

Slide3

Data processing and convertion: samtools, picard,

…Aligners: bwa,

bowtie, SHRiMP, …

RNA aligners: STAR, tophat, subjuncTranscriptome aligners: kallistio

, sailfish, RSEM, …

Old

-style

aligners

: Blast, Blat, VMATCH

De-

novo

assemblers

: trinity, velvet, spades,…Feature extraction, counting: HTSeq, featureCountTranscript discovery: cufflinksSpecialized tools: MISO, blast2go…

11/22/17

Michal Okoniewski, Scientific IT ETH

3

Bioinformatic

software jungle - categories

Slide4

11/22/17Michal Okoniewski, Scientific IT ETH

4

Bioinformatic software jungle

other

MISO/MATS

jsplice

cufflinks/

cuffdiff

SparkSeq

tests

DESeq

/

edgeR

DEXSeq

BAM

BAM

BAM

BAM

fastq

fastq

fastq

fastq

STAR

aligner

fastqc

Fastqc

report

ht_seq

SparkSeq

counts

SparkSeq

junctions

junctions

junctions

junctions

junctions

RPKM

tables

Count table

Count tables

(gene, exon…)

RPKM

tables

RSEM/

Bitseq

isoform

deconvolute

tophat

aligner

Experiment

definition

REPORTING PANEL

Selection of

graphs

Report types

output formats (BED CSV..)

Setting thresholds

Functional analysis

parametres

David

String DB

GeneGo

(commercial)

Ingenuity

(commercial)

Differential expression

report

Differential splicing

report

fastq

Functional analysis

tools

Statistical

analysis

tools

Genomic feature

extraction

- counting

Alignment

to the genome

CSV

BED

Reporting

Output

Genome

browser

Genome

browser

Genome

browser

Filters,

trimming

Filters,

trimming

unmapped

fastq

unmapped

fastq

unmapped

BAM

unmappedBAM

igenomes

GTF/GFF

genome

STARTING

PANEL

OptionsFor analysisMode:Step-by-stepSingle-run

Slide5

Building genome indexbowtie2-build

--threads 24 /

cluster/home/michalo

/work_michalo/hg38/Homo_sapiens.GRCh38.dna.primary_assembly.fa /cluster/scratch/michalo/hg38/hg38

Alignment:

11/22/17

Michal Okoniewski, Scientific IT ETH

5

Bowtie, bowtie2

Slide6

Classic splice-aware alignerUses bowtie2 as engine, so also bowtie2 index t

ophat -p

24 -o tophat_out

--library-type fr-firststrand ~/work_michalo

/hg38/hg38 mini.fastq.gz

Manual: http://

ccb.jhu.edu

/software/

tophat

/

index.shtml

11/22/17

Michal Okoniewski, Scientific IT ETH

6Tophat

Slide7

11/22/17Michal Okoniewski, Scientific IT ETH

7

“Tuxedo suite”

Slide8

Transcript discovery toolUses coverage and junctions from a BAM file

cufflinks mini_star.sorted.bam

Other cuffmerge

, cuffdiff, cuffquant, cuffnorm, CummeRbund11/22/17

Michal Okoniewski, Scientific IT ETH

8

Cufflinks

Slide9

Produces GTF

11/22/17

Michal Okoniewski, Scientific IT ETH

9Cufflinks

Slide10

Splice aware aligner, loading index into memoryResults similar to tophat, but faster--genomeLoad

LoadAndKeepWith specific options, can produce BAM and do the counting too

https://github.com/alexdobin/STAR/blob/master/doc/

STARmanual.pdf11/22/17Michal Okoniewski, Scientific IT ETH

10

STAR

Slide11

11/22/17

Michal Okoniewski, Scientific IT ETH

11

STAR

Slide12

Includes subjunc similar to STAR and featureCounts Building index

subread-buildindex -o /cluster/home/

michalo/

work_michalo/hg38/subread_index/hg38 /

cluster/home/michalo

/

work_michalo

/hg38/Homo_sapiens.GRCh38.dna.primary_assembly.fa

Alignment

subread

-T 24 -

i

/cluster/home/michalo/work_michalo/hg38/subread_index/hg38 -r

mini.fastq

-o mapped_reads_subjunc

/mini.bam

subjunc

-T 24 -

i

/cluster/home/

michalo

/

work_michalo

/hg38/

subread_index

/hg38 -r

mini.fastq

-o

mapped_reads_subjunc

/

mini.bam

http

://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf

11/22/17

Michal Okoniewski, Scientific IT ETH 12

subread

Slide13

General purpose tool for conversion of BAM  SAM Many other operations: pileup…

11/22/17

Michal Okoniewski, Scientific IT ETH

13samtools

Slide14

Fast and flexible counting in genomic featuresfeatureCounts

-M -s 2 -T 24 -t gene -g gene_id

-a /cluster/home/michalo

/work_michalo/hg38/Homo_sapiens.GRCh38.86.chr.gtf -o mini.cnt

mini_star.sorted.bam

Important options:

11/22/17

Michal Okoniewski, Scientific IT ETH

14

featureCounts

Slide15

11/22/17Michal Okoniewski, Scientific IT ETH

15

featureCounts

Slide16

GATK is a genomic toolbox for various operations related mainly to genomic variants calling Operations include producing a variant file *.vcf

from an alignment file *.bam module load

gcc/4.8.2

gdc java/1.8.0_73 gatk/3.5java

-jar GenomeAnalysisTK.jar

-T

UnifiedGenotyper

-

R ref/human_g1k_b37_20.fasta

-

I

bams

/exp_design/NA12878_wgs_20.bam -o sandbox/NA12878_wgs_20_UG_calls.vcf -

glm BOTH

-L

20:10,000,000-10,200,000https://software.broadinstitute.org/gatk/documentation/tooldocs/current/

https://software.broadinstitute.org/gatk/documentation/topic?name=tutorials

http://gatkforums.broadinstitute.org/gatk/discussion/7869/howto-discover-variants-with-gatk-a-gatk-workshop-tutorial

11/22/17

Michal Okoniewski, Scientific IT ETH

16

GATK

Slide17

Using

genomic software on Euler

Thank you!