analyze your data with a mouse click Igor Makunin imakuninuqeduau QAAFI UQ April 8 2015 Research Computing Centre UQ Genomics Virtual Laboratory Genome scale experiments are relatively cheap and very popular ID: 169414
Download Presentation The PPT/PDF document "Genomics Virtual Lab:" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Genomics Virtual Lab: analyze your data with a mouse click
Igor Makunini.makunin@uq.edu.auQAAFI, UQ, April 8, 2015
Research
Computing
Centre
@ UQSlide2
Genomics Virtual Laboratory
Genome scale experiments are relatively cheap and very popular - cost of high throughput sequencing is going down - available data (genomes, transcripts
etc)Analysis of NGS data is a bottleneck (
infrastructure, skills
)
Genomics Virtual Lab: take the IT out of Bioinformatics - web-based resources (biologists-friendly) - DIY bioinformatics environment (for geeks)GVL advantages: - public resources (no charges to users) - available immediatelySlide3
GVL products and services
Genomics Virtual Lab: genome.edu.auThe main aim: facilitate the genomics research in Australia
Galaxy:
Tutorials
and
protocols (nextGen sequencing)Galaxy for tutorials: galaxy-tut.genome.edu.auGalaxy for full-scale analysis: galaxy-qld.genome.edu.au
“roll your own” Galaxy on the Australian government funded computer infrastructure
(NeCTAR cloud) + ipython Notebook
+ RStudioDeploy your own computer cluster (NeCTAR
cloud)
Mirror of UCSC Genome Browser
RStudio
Learn
UseGet
InfoSlide4
Galaxy: how does it look like
Tools
Working window
HistorySlide5
Galaxy: possibilities
You can:analyze genome-scale nextGen sequencing data without bash
scriptingwork with big
datasets, genomic regions, sequences
etc.
create and use workflows (record steps of your analysis)share results and workflows with a user or make it available to anyoneData import: upload through the web interface ftp (for big datasets)Public data: UCSC Genome BrowserUCSC
Archaea Microbial data
EBA SRAOver 2,000 tools available through the Galaxy tool shedSlide6
Use: local Galaxy-
qld server
GVL Galaxy in Queensland:
galaxy
-
qld.genome.edu.auBWA, bowtie, bowtie2Velvet (microbial genome assembly)Trinity (de novo transcript assembly)tophat, tophat2 (RNA-
Seq)DESeq
, edgeR, Cufflinks (differential gene expression)Variant detection toolsMetagenomics
toolsMACS, MACS2, SPP (ChIP-Seq)SAMtools
Picard
100s users
1000s jobs per month
up to 1 Tb per user
(for the UQ users)Slide7
Data manipulation on Galaxy-
qld
GVL Galaxy in Queensland:
galaxy
-
qld.genome.edu.auUseful tools for data manipulation:FASTA manipulation MEME (identification of motifs)BLAST searchText manipulation: add column, merge, cut,
trim, compute expression etc.
Filter and SortJoin, Subtract and GroupFormat conversion (genomics)
Operate on Genomics Intervals (including Fetch closest feature)StatisticsSlide8
Good user practice for Galaxy-
qldGVL Galaxy in Queensland:
galaxy-
qld.genome.edu.au
Register with your UQ email and get a bigger disk allocation.
Use ftp for big datasets – it is faster. Galaxy recognises .gz compression.Do not store unneeded datasets. Delete temporary files such as SAM. Purge deleted datasets.Do not start many big jobs in parallel (BWA, bowtie, bowtie2, tophat, tophat2, velvet, trinity).Create and use workflows for multi-step analysis.Specify the quality score encoding for
nextGen sequencing data (FASTQ files).Slide9
Mirror of UCSC Genome Browser
ucsc.genome.edu.au
full mirror, regular update
keep user data for a long timeSlide10
Use:
RStudio
http
://gvl-rstudio.genome.edu.au/rstudio
/
Based on the GVL cluster
Genome data from Galaxy
Email to:
help@genome.edu.aufor the registrationSlide11
Genomics Virtual Lab: Learn
Genomics VL site:
genome.edu.au
Easy-to-follow Galaxy tutorials (DIY, online)
A dedicated Galaxy server:
galaxy-tut.genome.edu.au
Topics: RNA-Seq,
variant detection, ChIP-Seq, microbial genome assembly …Training through QFAB (with a nominal fee):
qfab.orgSlide12
GVL Get: roll your own Galaxy
Default NeCTAR
allocation for the UQ users: 2 CPUs, 8 GB RAMStart
you own virtual computer cluster on
the
NeCTAR cloudStart your own Galaxy on the NeCTAR cloud - admin rights (can add tools) - as powerful as needed (based on allocation) - ability to add worker nodes - ipython Notebook - RStudioDetailed instructions are available on the Genomics VL siteFollow announcements on QFAB
web site: qfab.orgSlide13
Summary
GVL provides resources for genomics research:
learn & Galaxy-tut local Galaxy-qld
roll your own
We are interested in users and the feedbackWhat you want to do? Any special needs? (tools, datasets, resources)What you want to learn?Do you want to share / promote your workflows with other people?Talk to us: Igor Makunini.makunin@uq.edu.au Slide14
Thank you!
GVL site: www.genome.edu.auGalaxy for tutorials: galaxy-tut.genome.edu.au
Galaxy Queensland: galaxy-
qld.genome.edu.au
Contributors and participants: