/
A Comprehensive Workflow for Microbial Genome Sequencing A Comprehensive Workflow for Microbial Genome Sequencing

A Comprehensive Workflow for Microbial Genome Sequencing - PowerPoint Presentation

mitsue-stanley
mitsue-stanley . @mitsue-stanley
Follow
385 views
Uploaded On 2016-04-04

A Comprehensive Workflow for Microbial Genome Sequencing - PPT Presentation

From Swab to Publication Madison I Dunitz 1 David A Coil 1 Jenna M Lang 1 Guillaume Jospin 1 Aaron E Darling 2 Jonathan A Eisen 1 UC Davis Genome Center 1 University of California Davis ID: 273723

genome assembly 16s sequencing assembly genome sequencing 16s tree sequence library preparation genomes phylogenetic workflow verification reads organism sanger

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "A Comprehensive Workflow for Microbial G..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

A Comprehensive Workflow for Microbial Genome Sequencing

From Swab to Publication

Madison I. Dunitz1, David A. Coil1, Jenna M. Lang1, Guillaume Jospin1, Aaron E. Darling2, Jonathan A. Eisen1

UC Davis Genome Center1University of California, Davis; 2ithree Institute, University of Sydney, Australia

The sequencing and

de novo assembly of microbial genomes has already yielded enormous scientific insight revolutionizing a diverse collection of fields, from epidemiology to ecology. In the past two decades increasing advances in DNA sequencing technology have led to the creation of a wide variety of options for DNA library preparation, sequencing and assembly.  Each option comes with its own advantages and disadvantages in terms of complexity, expense, computing power, time, and experience required.  This workflow was designed in an attempt to democratize the process of microbial sequencing and de novo assembly, in order to make them accessible to low funded labs or even classrooms on a massive scale.

GenBank Submission

Making a Phylogenetic Tree

Annotation

Library Preparation and Sequencing

Assembly

Verification of the Assembly

Identify/Choose Microbe

Sanger Sequence Processing

Bench Work

Swab

Plate

Dilution Streak (X2)

Overnight Culture

DNA Extraction

16s PCR

Create a BioProject at NCBI

FASTA2AGP (custom script)

Create a .sbt Template

Tbl2asn

Create a Whole Genome Shotgun Submission Submitting Raw Reads

SeqTraceGeneiousCustom Script

OptionsRAST

Obtain the Full-Length 16S SequenceCreate an RDP AlignmentBuilding the TreeViewing the Tree

Library PreparationKit OptionsConsiderations in Library PreparationMultiplexing

Download and Install A5Running A5

Interpretation of A5 statsVerification of 16S SequencePhylosift

BLAST 16SInterpret the ResultsGOLDAlign the 16S Sequences using Align Sequences Nucleotide BLAST

Introduction

The objective of the present study was to design, test, troubleshoot, and publish a comprehensive workflow for taking a researcher from a swab to a microbial genome publication; enabling even a lab with limited resources and bioinformatics experience to perform it.

What Do You Think?

Fill out a post-it note to let us know

Results

Bench Work

We assume a starting point of wanting to isolate an organism from a particular environment and needing to identify it. Users starting with a known organism should proceed to "Library Preparation and Sequencing”.

Here

we cover the steps necessary to take a sample through plating, dilution streaking, overnight growth, creating a glycerol stock, 16s PCR and preparation for Sanger sequencing.

Sanger

Sequence ProcessingIn this section we identify multiple software programs that allow the researcher to view and edit the genetic sequence. We detail the advantages and disadvantages of particular programs and explain how to quality trim the reads, reverse complement and align the reads and generate a consensus sequence. This is easiest to do visually via a chromatogram allowing the user to see the trace and process the sequences manually.Identify/Choose MicrobeIn a classroom or undergraduate research setting the researchers may not have a particular bacterial species in mind. In this case it is necessary to screen the 16S Sanger sequencing results for possible genome project candidates. We recommend starting with the BLAST results, then continuing onto the Genomes Online Database (GOLD), and simply Google searching. In many cases it will be handy to first build a phylogenetic tree to aid in identification since the 16S results may not beLibrary Preparation and SequencingThe choice of sequencing technology and of library preparation method for genome sequencing is ever-changing and much-debated. Here we recommend using Illumina MiSeq for reasons of cost, depth of coverage, and length of reads. Furthermore, the assembly pipeline, A5-miseq, requires Illumina data and is optimized for the longer reads from the MiSeq. AssemblyGenome assembly typically consists of data cleaning (quality filtering and adaptor removal), error correction, contig assembly, scaffolding, and verification of scaffolds/contigs. There are a large array of programs that can perform some, or most of these steps. These programs include commercial and open-source options.For this workflow we recommend using the open source A5 assembly pipeline which automates all of the steps described above with a single command .

Verification of the AssemblyThere are three portions to the verification of a genome assembly. The first is the overall "quality" of the assembly, assessed by examining the stats provided by A5 (e.g. number of contigs and contig N50). The second is verification that the organism sequenced is the organism of interest, simply by checking the 16S sequence with BLAST. The third is the "completeness” of the genome, which is difficult to measure except in cases where a close reference (representative example of the species’ genome) is available. AnnotationThere are a number of different pipelines available for the annotation of bacterial genomes. These include Prokka, IMG, RAST, PGAP and others. Genomic annotation involves identifying protein coding regions and attempting to predict the genes being coded for and their biological function.Making a Phylogenetic TreeThere are two points during the workflow where making a 16S phylogenetic tree may be useful. The first is after identification of candidate organisms by Sanger sequencing and the second is after assembly of the genome. The process is identical in both cases, but the additional length and improved quality of the post-assembly 16S sequence may generate a better tree. The tree can be used for identification of the candidate (e.g. is the candidate found in a single species clade), for naming of the candidate (does it fall in a clade containing only members of that species, and other members of the species are not found outside that clade), and for placement of the organism into a phylogenetic context.The outline of this step, is to use the Ribosomal Database Project (RDP) to generate an alignment of the sequence with close relative and an out-group, following by cleanup of the RDP headers, tree-building with FastTree and viewing/analysis of the tree in Dendroscope.GenBank SubmissionThis section describes how to submit contigs and scaffolds (if applicable) as a Whole Genome Shotgun (WGS) submission to Embank. We also recommend allowing Embank to annotate the genome themselves, since submitting RAST annotations to GenBank can be prohibitively complicated. The genomes are automatically shared with the DNA Data Bank of Japan (DDBJ) and EBML. In addition, genomes from GenBank are automatically pulled into IMG, and are annotated there as well.