/
Bioinformatics for Whole-Genome Shotgun Sequencing of Micro Bioinformatics for Whole-Genome Shotgun Sequencing of Micro

Bioinformatics for Whole-Genome Shotgun Sequencing of Micro - PowerPoint Presentation

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
416 views
Uploaded On 2016-06-11

Bioinformatics for Whole-Genome Shotgun Sequencing of Micro - PPT Presentation

By Kevin Chen Lior Pachter PLoS Computational Biology 2005 David Kelley State of metagenomics In July 2005 9 projects had been completed General challenges were becoming apparent Paper focuses on computational problems ID: 357688

genome assembly assisted species assembly genome species assisted genes reads overlaps novo biology related gene read improve 2005 sequence

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Bioinformatics for Whole-Genome Shotgun ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Bioinformatics for Whole-Genome Shotgun Sequencing of Microbial Communities

By Kevin Chen, Lior PachterPLoS Computational Biology, 2005

David KelleySlide2

State of metagenomics

In July 2005, 9 projects had been completed.General challenges were becoming apparentPaper focuses on computational problemsSlide3

Assembling communities

GoalRetrieval of nearly complete genomes from the environmentChallengesNeed sufficient read depth- species must be prominentAvoid mis-assembling across species while maximizing contig

sizeSlide4

Comparative assembly

Align all reads to a closely-related “reference” genomeInfer contigs from read alignmentsRearrangements limit effectiveness

Pop M. et al. Comparative genome assembly. Briefings in Bioinformatics 2004.Slide5

“Assisted” Assembly

De novo assemblyComplement by aligning reads to reference genome(s)Short overlaps can be trustedSingle mate links can be trustedMis-assemblies can be detected

Gnerre

S. et al. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biology 2009.Slide6

Assisted Assembly

Gnerre S. et al. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biology 2009.Slide7

Assisted Assembly

Gnerre S. et al. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biology 2009.Slide8

Assisted Assembly

Gnerre S. et al. Assisted assembly: how to improve a de novo genome assembly by using related species. Genome Biology 2009.Slide9

Metagenomics application

Pros:Low coverage speciesIf conservative, unlikely to hurtConsExotic microbes may have no good referencesPotential to propagate mis

-assembliesSlide10

Overlap-layout-consensus

Species-levelIncreased polymorphismReads come from different individualsMissed overlapsSystem-levelHomologous sequenceFalse overlapsSlide11

Polymorphic diploid eukaryotes

Reads sequenced from 2 chromosomesSingle reference sequence expectedKeep duplications separateKeep polymorphic haplotypes togetherSlide12

Strategy 1

Form contigs aggressivelyDetect alignments between contigs and resolveAvoid merging duplications by respecting mate pair distances

Jones, T. et al. The diploid genome sequence of Candida

albicans

. PNAS 2004.Slide13

Strategy 2

Assemble chromosomes separatelyErase overlaps with splitting ruleVinson et al. Assembly of polymorphic genomes: Algorithms and application to

Ciona

savignyi

. Genome Research 2005.Slide14

Back to metagenomics

Strategy 1Assemble aggressivelyDetect mis-assemblies and fixStrategy 2Separate reads or filter overlapsSlide15

Binning

Presence of informative genesE.g. 16S rRNAMachine learningK-mersCodon biasWorked well only with big scaffolds

Lots of progress in this area since 2005Slide16

Abundances

Depth of read coverage suggests relative abundance of species in sampleDifficult if polymorphism is significantSeparate individuals  too lowMerge species  too high

Depends on good classificationSlide17

How much sequencing

G = genome size (or sum of genomes)c = global coveragek = local coveragenk= bp

w

/ coverage

kSlide18

Poisson model

“Interval” =[

x

l

r

,

x

]

“Events” = read starts

λ

” = coverage

x

x-l

rSlide19

Gene Finding

Focus on genes, rather than genomesBacterial gene finders are very accurateAssemble and run on scaffoldsBLAST leftover reads against protein dbSlide20

Partial genes

Tested GLIMMER on simulated 10 Kb contigsMany genes crossed bordersGLIMMER often predicted a truncated versionGene finding models could be adjusted to account for this caseSlide21

Gene-centric analysis

Cluster genes by orthologyOrthology refers to genes in different species that derive from a common ancestorExpress sample as vector of abundancesSlide22

UPGMA on KEGG vectorsSlide23

PCA on KEGG vectors

Principal components may correspond to interesting pathways or functionsSlide24

How much sequencing

N = # genes in communityf = fraction foundCoupon collector’s problemSlide25

Phylogeny

Apply multiple sequence alignment and phylogeny reconstruction to gene sequencesSlide26

Partial sequences

Bad for common msa programsSemi-global alignment is requiredSlide27

Supertree methods

Construct tree from multiple subtreesSplit gene into segments?Construct subtree on sequences that align fully to segment?Slide28

Thanks!