/
RESEARCHARTICLEOpenAccess RESEARCHARTICLEOpenAccess

RESEARCHARTICLEOpenAccess - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
396 views
Uploaded On 2016-08-14

RESEARCHARTICLEOpenAccess - PPT Presentation

Ataleofthreenextgenerationsequencing platformscomparisonofIonTorrentPacific BiosciencesandIlluminaMiSeqsequencers MichaelAQuail MiriamSmithPaulCouplandThomasDOttoSimonRHarrisThomasRConnorAnn ID: 446472

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "RESEARCHARTICLEOpenAccess" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

RESEARCHARTICLEOpenAccess Ataleofthreenextgenerationsequencing platforms:comparisonofIonTorrent,Pacific BiosciencesandIlluminaMiSeqsequencers MichaelAQuail * ,MiriamSmith,PaulCoupland,ThomasDOtto,SimonRHarris,ThomasRConnor,AnnaBertoni, HaroldPSwerdlowandYongGu Abstract Background: Nextgenerationsequencing(NGS)technologyhasrevolutionizedgenomicandgeneticresearch.The paceofchangeinthisareaisrapidwiththreemajornewsequencingplatformshavingbeenreleasedin2011:Ion Torrent ’ sPGM,PacificBiosciences RSandtheIlluminaMiSeq.Herewecomparetheresultsobtainedwiththose platformstotheperformanceoftheIlluminaHiSeq,thecurrentmarketleader.Inordertocomparetheseplatforms, andgetsufficientcoveragedepthtoallowmeaningfulanalysis,wehavesequencedasetof4microbialgenomes withmeanGCcontentrangingfrom19.3to67.7%.Together,theserepresentacomprehensiverangeofgenome content.Herewereportouranalysisofthatsequencedataintermsofcoveragedistribution,bias,GCdistribution, variantdetectionandaccuracy. Results: SequencegeneratedbyIonTorrent,MiSeqandPacificBiosciencestechnologiesdisplaysnearperfect coveragebehaviouronGC-rich,neutralandmoderatelyAT-richgenomes,butaprofoundbiaswasobservedupon sequencingtheextremelyAT-richgenomeof Plasmodiumfalciparum onthePGM,resultinginnocoveragefor approximately30%ofthegenome.Weanalysedtheabilitytocallvariantsfromeachplatformandfoundthatwe couldcallslightlymorevariantsfromIonTorrentdatacomparedtoMiSeqdata,butattheexpenseofahigher falsepositiverate.VariantcallingfromPacificBiosciencesdatawaspossiblebuthighercoveragedepthwas required.ContextspecificerrorswereobservedinbothPGMandMiSeqdata,butnotinthatfromthePacific Biosciencesplatform. Conclusions: Allthreefastturnaroundsequencersevaluatedherewereabletogenerateusablesequence. Howevertherearekeydifferencesbetweenthequalityofthatdataandtheapplicationsitwillsupport. Keywords: Next-generationsequencing,Iontorrent,Illumina,Pacificbiosciences,MiSeq,PGM,SMRT,Bias,Genome coverage,GC-rich,AT-rich Background Sequencingtechnologyisevolvingrapidlyandduring thecourseof2011severalnewsequencingplatforms werereleased.OfnoteweretheIonTorrentPersonal GenomeMachine(PGM)andthePacificBiosciences (PacBio)RSthatarebasedonrevolutionarynew technologies. TheIonTorrentPGM “ harnessesthepowerofsemi- conductortechnology ” detectingtheprotonsreleasedas nucleotidesareincorporatedduringsynthesis[1].DNA fragmentswithspecificadaptersequencesarelinkedto andthenclonallyamplifiedbyemulsionPCRonthesur- faceof3-microndiameterbeads,knownasIonSphere Particles.Thetemplatedbeadsareloadedintoproton- sensingwellsthatarefabricatedonasiliconwaferandse- sequence.Assequencingproceeds,eachofthefourbases isintroducedsequentially.Ifbasesofthattypeareincor- porated,protonsarereleasedandasignalisdetectedpro- portionaltothenumberofbasesincorporated. PacBiohavedevelopedaprocessenablingsinglemol- eculerealtime(SMRT)sequencing[2].Here,DNApoly- merasemolecules,boundtoaDNAtemplate,are *Correspondence: mq1@sanger.ac.uk WellcomeTrustSangerInstitute,Hinxton,UK ©2012Quailetal.;licenseeBioMedCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. Quail etal.BMCGenomics 2012, 13 :341 http://www.biomedcentral.com/1471-2164/13/341 attachedtothebottomof50nm-widewellstermedzero-modewaveguides(ZMWs).EachpolymeraseisallowedtocarryoutsecondstrandDNAsynthesisinthepresenceof-phosphatefluorescentlylabelednucleo-tides.ThewidthoftheZMWissuchthatlightcannotpropagatethroughthewaveguide,butenergycanpene-trateashortdistanceandexcitethefluorophoresattachedtothosenucleotidesthatareinthevicinityofthepolymeraseatthebottomofthewell.Aseachbaseisincorporated,adistinctivepulseoffluorescenceisdetectedinrealtime.Inrecentyears,thesequencingindustryhasbeendominatedbyIllumina,whohaveadoptedasequencing-by-synthesisapproach[3],utilizingfluorescentlylabeledreversible-terminatornucleotides,onclonallyamplifiedDNAtemplatesimmobilizedtoanacrylamidecoatingonthesurfaceofaglassflowcell.TheIlluminaGenomeAnalyzerandmorerecentlytheHiSeq2000havesetthestandardforhighthroughputmassivelyparallelsequen-cing,butin2011Illuminareleasedalowerthroughputfast-turnaroundinstrument,theMiSeq,aimedatsmallerlaboratoriesandtheclinicaldiagnosticmarket.HereweevaluatetheoutputofthesenewsequencingplatformsandcomparethemwiththedataobtainedfromtheIlluminaHiSeqandGAIIxplatforms.Table1givesasummaryofthetechnicalspecificationsofeachoftheseinstruments.SequencegenerationPlatformspecificlibrarieswereconstructedforasetofmicrobialgenomesBordetellapertussis(67.7%GC,withsomeregionsinexcessof90%GCcontent),SalmonellaPullorum(52%GC),Staphylococcusaureus(33%GC)Plasmodiumfalciparum(19.3%GC,withsomeregionscloseto0%GCcontent).Weroutinelyusethesetotestnewsequencingtechnologies,astogethertheirsequencesrepresenttherangeofgenomiclandscapesthatonemightencounter.PCR-free[4]Illuminalibrarieswereuniquelybar-coded,pooledandrunonaMiSeqflowcellwithpaired150basereadsplusa6-baseindexreadandalsoonasinglelaneofanIlluminaHiSeqwithpaired75basereadsplusan8-baseindexread(Additionalfile1:TableS1).IlluminalibrariespreparedwithamplificationusingKapaHiFipolymerase[5]wererunonasinglelaneofanIlluminaGAIIxwithpaired76basereadsplusan8-baseindexreadandonaMiSeqflowcellwithpaired150basereadsplusa6-baseindexread.PCR-freelibrariesrepresentanimprovementoverthestandardIlluminali-brarypreparationmethodastheyresultinmoreevensequencecoverage[4]andareincludedherealongsidelibrariespreparedwithPCRinordertoenablecompari-sontoPacBiowhichhasanamplificationfreeworkflow.IonTorrentlibrarieswereeachrunonasingle316chipfora65cyclesgeneratingmeanreadlengthsof124bases(Additionalfile1:TableS2).StandardPacBiolibraries,withanaverageof2kbinserts,wererunindividuallyovermultipleSMRTcells,eachusingC1chemistry,andproviding20xsequencecoveragedataforeachgenome(Additionalfile1:TableS3).Thedatasetsgeneratedweremappedtothecorre-spondingreferencegenomeasdescribedinMethods.Forafaircomparison,allsequencedatasetswereran-domlydown-sampled(normalized)tocontainreadsrepresentinga15xaveragegenomecoverage. Table1TechnicalspecificationsofNextGenerationSequencingplatformsutilisedinthisstudyPlatformIlluminaMiSeqIonTorrentPGMPacBioRSIlluminaGAIIxIlluminaHiSeq2000InstrumentCost*$128K$80K**$695K$256K$654KSequenceyieldperrun1.5-2Gb20-50Mbon314chip,100-200Mbon316chip,1Gbon318chip100Mb30Gb600GbSequencingcostperGb*$502$1000(318chip)$2000$148$41RunTime27hours***2hours2hours10days11daysReportedAccuracyMostlyQ30MostlyQ20Q10MostlyQ30MostlyObservedRawErrorRate0.80%1.71%12.86%0.76%0.26%Readlengthupto150bases~200basesAverage1500bases****(C1chemistry)upto150basesupto150basesPairedreadsYesYesNoYesYesInsertsizeupto700basesupto250basesupto10kbupto700basesupto700basesTypicalDNArequirements50-1000ng100-1000ng~1g50-1000ng50-1000ng*Allcostcalculationsarebasedonlistpricequotationsobtainedfromthemanufacturerandassumeexpectedsequenceyieldstated.**SystempriceincludingPGM,server,OneTouchandOneTouchES.***Includestwohoursofclustergeneration.Meanmappedreadlengthincludesadapterandreversestrandsequences.Subreadlengths,i.e.theindividualstretchesofsequenceoriginatingfromsequencedfragment,aresignificantlyshorter.etal.BMCGenomicsPage2of13http://www.biomedcentral.com/1471-2164/13/341 AlltheplatformshavelibrarypreparationprotocolsthatinvolvefragmentinggenomicDNAandattachingspe-cificadaptersequences.Typicallythistakessomewherebetween4and8hoursforonesample.Inaddition,theIonTorrenttemplatepreparationhasatwohouremul-sionPCRandatemplatebeadenrichmentstep.Inthebattletobecometheplatformwiththefastestturnaroundtime,allthemanufacturersareseekingtostreamlinelibrarypreparationprotocols.LifeTechnolo-gieshavedevelopedtheIonXpressFragmentLibraryKitthathasanenzymaticFragmentaseformulationforshearingstartingDNA,therebyavoidingthelabourofphysicalshearingandpotentiallyenablingcompleteli-braryautomation.Wetestedthiskitonourfourgen-omesalongsidethestandardlibrarykitwithphysicalshearingandfoundbothtogiveequalgenomicrepre-sentation(seeAdditionalfile2:FigureS1forresultsobtainedwithP.falciparum).IlluminapurchasedEpi-centreinordertopackagetheNexteratechnologywiththeMiSeq.NexterausesatransposontosheargenomicDNAandsimultaneouslyintroduceadaptersequences[6].TheNexteramethodcanproducesequencingreadyDNAinaround90minutesandgaveusremarkablyevengenomerepresentation(Additionalfile2:FiguresS2andAdditionalfile2:FigureS3)withB.pertussisandS.aur-,butproducedaverybiasedsequencedatasetfromtheextremelyAT-richP.falciparumgenome.GenomecoverageandGCbiasToanalysetheuniformityofcoverageacrossthegenomewetabulatedthedepthofcoverageseenateachpositionofthegenome.WeutilizedthecoverageplotsdescribedbyLametal.,[7]thatdepict;thepercentageofthegen-omethatiscoveredatagivenreaddepth,andgenomecoverageatdifferentreaddepthsrespectively,foreachdataset(Figure1)alongsidetheidealtheoreticalcover-agethatwouldbepredictedbasedonPoissonbehaviour.InthecontextoftheGC-richgenomeofB.pertussismostplatformsgavesimilaruniformityofsequencecoverage,withtheIonTorrentdatagivingslightlymoreunevencoverage.IntheS.aureusgenomethePGMperformedbetter.ThePGMgaveverybiasedcoveragewhensequencingtheextremelyAT-richP.falciparumgenome(Figure1).ThisaffectwasalsoevidentwhenweplottedcoveragedepthagainstGCcontent(Additionalfile2:FigureS4).WhilstthePacBioplatformgaveasequencedatasetwithquiteevencoverageonGCandextremelyAT-richcontexts,itdiddemonstrateslightbutnoticeableunevennessofcoverageandbiastowardsGC-richsequenceswiththeS.aureusgenome.WiththeGC-neutralPullorumgenomeallplatformsgaveequalcoveragewithunbiasedGCrepresentation(datanotshown).ThemostdramaticobservationfromourresultswastheseverebiasseenwhensequencingtheextremelyAT-richgenomeofP.falciparumonthePGM.TheresultofthiswasdeeperthanexpectedcoverageoftheGC-richandsubtelomericregionsandpoorcoveragewithinintronsandAT-richexonicsegments(Figure2),withap-proximately30%ofthegenomehavingnosequencecoveragewhatsoever.Thisbiaswasobservedwithlibrar-iespreparedusingbothenzymaticandphysicalshearing(Additionalfile2:FigureS1).Inarecentstudytoinvestigatetheoptimalenzymefornextgenerationlibrarypreparation[5],wefoundthattheenzymeusedforfragmentamplificationduringnextgenerationlibrarypreparationcanhaveasignifi-cantinfluenceonbias.WefoundtheenzymeKapaHiFiamplifiesfragmentswiththeleastbias,givingevencoverage,closetothatobtainedwithoutamplifi-cation.SincethePGMhastwoamplificationsteps,oneduringlibrarypreparationandtheotheremulsionPCR(emPCR)fortemplateamplification,wereasonedthatthismightbethecauseoftheobservedbias.Sub-stitutingthesuppliedPlatinumTaqenzymewithKapaHiFiforthenicktranslationandamplificationstepduringlibrarypreparationprofoundlyreducedtheobservedbias(Figure3).WewereunabletofurtherimprovethisbyuseofKapaHiFifortheemPCR(resultsnotshown).Ofthefourgenomessequenced,theP.falciparumgenomeisthelargestandmostcomplexandcontainsasignificantquantityofrepetitivesequences.Weusedfalciparumtoanalysetheeffectofreadlengthversusmappability.AsthePacBiopipelinedoesntgenerateamappingqualityvalueandtoensureafaircomparison,weremappedthereadsofalltechnologiesusingthek-merbasedmapper,SMALT[9],andthenanalysedcover-ageacrosstheP.falciparumgenome(Additionalfile3:TableS4).ThisdataconfirmsthepoorperformanceofIonTorrentontheP.falciparumgenome,asonly65%ofthegenomeiscoveredwithhighquality(Q20)readscomparedto~98-99%fortheotherplatforms.WhilstthemeanmappedreadlengthofthePacBioreadswiththisgenomewas1336bases,averagesubreadlength(thelengthofsequencecoveringthegenome)issignificantlyless(645bases).Theshortaveragesubreadlengthisduetopreferentialloadingofshortfragmentconstructsinthelibraryandtheeffectoflagtime(non-imagedbases)aftersequencinginitiation,thelatterresultinginsequencesnearthebeginningoflibraryconstructsnotbeingreported.AsthemedianlengthofthePacBiosubreadsforthisdatasetarejust600bases,wecomparedtheircoveragewithanequivalentamountofinsilicofilteredreadsof620bases.Thisledtoaverysmalldecreaseintheper-centageofbasescovered.Usingpairedreadsontheetal.BMCGenomicsPage3of13http://www.biomedcentral.com/1471-2164/13/341 IlluminaMiSeq,however,gaveastrongpositiveeffect, with1.1%morecoveragebeingobservedfrompaired- endreadscomparedtosingle-endreads. Errorrates Weobservederrorratesofbelow0.4%fortheIllumina platforms,1.78%forIonTorrentand13%forPacBiose- quencing(Table1).Thenumberoferror-freereads, withoutasinglemismatchorindel,was76.45%,15.92% and0%for,MiSeq,IonTorrentandPacBio,respectively. TheerrorheatmapinFigure2AshowsthatthePacBio errorsaredistributedevenlyoverthechromosome.We manuallyinspectedtheregionswhereIonTorrentand Illuminageneratedmoreerrors.Illuminaproducederrors afterlong( � 20-base)homopolymertracts[10] (Figure4A). AlsoevidentintheMiSeqdata,werestranderrorsdue totheGGCmotif[11].Followingthefindingthatthe motifGGCgeneratesstrand-specificerrors,weanalyzed thisphenomenonintheMiSeqdatafor P.falciparum (Additionalfile4:TableS5).Weobservedthattheerror ismostlygeneratedbyGC-richmotifs,principally GGCGGG.Wefoundnoevidenceforanerrorifthe tripletaftertheGGCisAT-rich.OtherMiSeqdatasets alsoshowedthisartifact(datanotshown).Inadditionto thisbeingastrand-specificissue,itappearsthatthisisa read-specificphenomenon.Whilstthereisaqualitydrop inthefirstreadfollowingtheseGC-richmotifs,thereis Figure1 Genomecoverageplotsfor15xdepthrandomlydownsampledsequencecoveragefromthesequencingplatformstested. A )Thepercentageofthe B.pertussis genomecoveredatdifferentreaddepths; B )Thenumberofbasescoveredatdifferentdepthsfor B. pertussis; C )Thepercentageofthe S.aureus genomecoveredatdifferentreaddepths; D )Thenumberofbasescoveredatdifferentdepthsfor S.aureus; E )Thepercentageofthe P.falciparum genomecoveredatdifferentreaddepths;and F )Thenumberofbasescoveredatdifferent depthsfor P.falciparum . Quail etal.BMCGenomics 2012, 13 :341 Page4of13 http://www.biomedcentral.com/1471-2164/13/341 astrikinglossofqualityinread2,wherethereadshave nearlyhalfthemeanqualityvaluecomparedtotheread 1readsforGC-richtripletsthatfollowtheGGCmotif. Wecouldobservethislowqualityinread2inallour analysedIlluminalanes.ForAT-richmotifstheratiois nearly1(1.03). IonTorrentdidn ’ tgeneratereadsatallforlong( � 14- base)homopolymertracts,andcouldnotpredictthe correctnumberofbasesinhomopolymers � 8bases long.Veryfewerrorswereobservedfollowingshort homopolymerstretchesintheMiSeqdata(Figure4B). Additionally,weobservedstrand-specificerrorsinthe PGMdatabutwereunabletoassociatethesewithany obviousmotif(Figure4C). SNPcalling Inordertodeterminewhetherornotthehighererror ratesobservedwiththePGMandPacBioaffectedtheir abilitytocallSNPs,wealignedthereadsfromthe S. aureus genome,forwhichallplatformsgavegoodse- quencerepresentation,againstthereferencegenomeof thecloselyrelatedstrainUSA300_FPR3757[12] , and comparedtheSNPscalledagainstthoseobtainedby aligningthereferencesequencesofthetwogenomes (Figure5andAdditionalfile5:TableS6).Inorderto createafaircomparisonweinitiallyusedthesameran- domlynormalized15xdatasetsusedinouranalysisof genomecoverage,whichaccordingtotheliterature[3] issufficienttoaccuratelycallheterozygousvariantsbut foundthatthatwasinsufficientforthePacBiodatasets wherea190xcoveragewasused. OveralltherateofSNPcallingwasslightlyhigherfor theIonTorrentdatathanforIlluminadata(chisquare pvalue3.15E-08),withapproximately82%ofSNPs beingcorrectlycalledforthePGMand68-76%ofthe SNPsbeingdetectedfromtheIlluminadata(Figure5A). Conversely,therateoffalseSNPcallswashigherwith IonTorrentdatathanforIlluminadata(Figure5B).SNP callingfromPacBiodataprovedmoreproblematic,as existingtoolsareoptimizedforshort-readdataandnot forhigherror-ratelong-readdata.Wewererelianton SNPscalledbytheSMRTportalpipelineforthisana- lysis.OurresultsshowedthatSNPdetectionfromPac- Biodatawasnotasaccurateasthatfromtheother A) B)C) PacBioGAIIHISeqMIseq PGM D) %GC 50.1 7.5 Coverage Depth Coverage Depth %GC Figure2 Artemisgenomebrowser [8] screenshotsillustratingthevariationinsequencecoverageofaselectedregionof P.falciparum chromosome11,with15xdepthofrandomlynormalizedsequencefromtheplatformstested. Ineachwindow,thetopgraphshowsthe percentageGCcontentateachposition,withthenumbersontherightdenotingtheminimum,averageandmaximumvalues.Themiddle graphineachwindowisacoverageplotforthedatasetfromeachinstrument;thecolourcodeisshownabovegrapha).Eachofthemiddle graphsshowsthedepthofreadsmappedateachposition,andbelowthatinB-Darethecoordinatesoftheselectedregioninthegenomewith genemodelsonthe(+)strandaboveand(  )strandbelow. A )Viewofthefirst200kbofchromosome11.Graphsaresmoothedwithwindow sizeof1000.Aheatmapoftheerrors,normalizedbytheamountofmappingreadsisincludedjustbelowtheGCcontentgraph(PacBiotopline, PGMmiddleandMiSeqbottom). B )CoverageoverregionofextremeGCcontent,rangingfrom70%to0%. C )Coverageoverthegene PF3D7_1103500. D )ExampleofintergenicregionbetweengenesPF3D7_1104200andPF3D7_1104300.ThewindowsizeofB,CandDis50bp. Quail etal.BMCGenomics 2012, 13 :341 Page5of13 http://www.biomedcentral.com/1471-2164/13/341 platforms,withoverallonly71%ofSNPsbeingdetected and2876SNPsbeingfalselycalled(Additionalfile5: TableS6). AmongstthedatasetsobtainedfromtheIllumina sequencers,thepercentageofcorrectSNPcallswas higherfortheMiSeq(76%)thantheGAIIx(70%) datathanforthatobtainedfromtheHiSeq(69%), despitethesamelibrariesbeingrunonbothMiSeq andHiSeq.TheuseofNexteralibrarypreparation gavesimilarresultswith76%ofSNPsbeingcorrectly called.Itshouldbenotedthatwefoundtheinbuilt automaticvariantcallinginadequateonbothMiSeq andPGM,withMiSeqreportercallingjust6.6%of variantsandTorrentsuite1.5.1callingonly1.4%of variants. Discussion Akeyfeatureofthesenewplatformsistheirspeed.De- creasingruntimehasclearadvantagesparticularly withintheclinicalsequencingarena,butposeschal- lengesinitself.Whilstmanufacturersmaystatelibrary preptimesontheorderofacoupleofhours,thesetimes don ’ tincludeupfrontQCandlibraryQCandquantifica- tion.Also,typicallibrarypreptimesquotedusually applytoprocessingofonlyonesample;i.e.,pipetting timeislargelyignored.Purchasersofsequencinginstru- mentswillwanttokeepthemrunningatfullutilization, inordertomaximizetheirinvestmentandwillalsowant topoolmultiplesamplesonsinglerunsforeconomic reasons.Toobtainmaximumthroughput,usersmust considerthewholeprocess,potentiallyinvestinginan- cillaryequipmentandroboticstocreateanautomated pipelineforthepreparationoflargenumbersofsamples. Toprocesslargenumbersofsamplesquickly,afacility ’ s instrumentbasemustbelargeenoughtoavoidsample backlogs.Withthisinmind,manufacturersareseeking todevelopmorestreamlinedsample-prepprotocolsthat willfacilitatetimelysampleloading.Herewehavetested twosuchdevelopments:enzymaticfragmentationand theNexteratechnique.Weconcludethatthesemethods canbeveryuseful,butusersmustcarefullyevaluatethe methodstheyusefortheirparticularapplicationsand forusewithgenomesofextremebasecompositionto avoidbias. WhilstthedatageneratedusingtheIonTorrentPGM platformhasahigherrawerrorrate(~1.8%)thanIllu- minadata( 0.4%),providedthereissufficientcoverage, therepresentationandabilitytocallSNPsisquite closelymatchedbetweenthesetechnologieswithmore truepositivesbeingcalledfromPGMdatabutfarless falsepositivesfromtheIlluminadata.DetectionofSNPs usingPacBiodatawasnotasaccurate;theuseofsingle- moleculesequencingtodetectlowlevelvariantsand quasispecieswithinpopulationsremainsunproven.We havefoundPacBio ’ slongreadsusefulforscaffolding de novo assemblies,thoughourexperiencesuggeststhat thisiscurrentlynotfullyoptimizedandextensive methoddevelopmentisstillrequired. Figure3 TheeffectofsubstitutingPlatinumHiFiPCRsupermixwithKapaHiFiinthePGMlibraryprepamplificationstep. A )Thepercentageofthe P.falciparum genomecoveredatdifferentreaddepths.Thebluelineshowsthedataobtainedwiththerecommended PlatinumenzymeandthegreenlinewithKapaHiFi.Theredlinedepictsidealcoveragebehavior. B )Thenumberofbasescoveredatdifferent depths. C )Sequencerepresentationvs.GC-contentplots. Quail etal.BMCGenomics 2012, 13 :341 Page6of13 http://www.biomedcentral.com/1471-2164/13/341 Interestingly,themappabilitydidn ’ tincreasesignifi- cantlywithlongerreads,althoughabeneficialeffectwas obtainedfromhavingmate-pairinformation.Current PacBioprotocolsfavorthepreferentialloadingofsmaller constructs,resultinginaveragesubreadlengthsthatare significantlyshorterthantheoftenquotedaverageread lengths.Furtherdevelopmentisthereforerequiredto avoidhavingexcessshortfragmentsandadapter-dimer constructsinthelibraryandreducingtheirloadingeffi- ciencyintotheZMWs. Whilstonewouldnormallyusehighercoveragethan usedhereforconfidentSNPdetection(i.e.,30-40x depth),wewerelimitedto15xdepthduetotheyieldof someoftheplatforms.Nonetheless,atleastforthehap- loidgenome, S . aureus ,15xcoverageshouldbeareason- ablequantityforSNPdetectionandeveninthehuman genome,15xcoveragehasbeenshowntobesufficientto accuratelycallheterozygousSNPs[3]. Variantcallingisahighlysubjectiveprocess;thepar- ticularsoftwarechosenaswellasthespecificparameters employedtomakethepredictionswillchangetheresults substantially.Assuch,therateofbothtrueSNPand falsepositivecallingprovidedherearepurelyindicative andresultsobtainedwitheachsequencingplatformwill vary.Foranyparticularapplicationusingaspecificse- quencingmethod,optimisationoftheSNP-andindel- callingalgorithmwouldalwaysberecommended. Wesequencemanyisolatesofthemalariaparasite P. falciparum asitrepresentsasignificanthealthissuein developingcountries;thisorganismleadstoseveralmil- liondeathsperannum.Thereareseveralactivelargese- quencingprograms(e.g.MalariaGEN[13])thatare currentlyaimingtosequencethousandsofclinicalmal- ariasamples.AsthemalariagenomehasaGCcontent ofonly19.4%[14],weuseitasoneofourtestgenomes, representingasignificantchallengetomostsequencing technologies.Basedonthepresentstudy,useofIllumina sequencingtechnologywithlibrariespreparedwithout amplification[4]leadstotheleastbiasedcoverageacross thisgenome.IonTorrentsemiconductorsequencingis A)B)C) P I M Figure4 Illustrationofplatform-specificerrors. ThepanelsshowArtemisBAMviewswithreads(horizontalbars)mappingtodefinedregions ofchromosome11of P.falciparum fromPacBio(P;top),IonTorrent(I;middle)andMiSeq(M;bottom).Redverticaldashesare1basedifferences tothereferenceandwhitepointsareindels. A )IllustrationoferrorsinIlluminadataafteralonghomopolymertract.Iontorrentdatahasadrop ofcoverageandmultipleindelsarevisibleinPacBiodata. B )Exampleoferrorsassociatedwithshorthomopolymertracts.Multipleinsertionsare visibleinthePacBioData,deletionsareobservedinthePGMdataandtheMiSeqsequencesreadgenerallycorrectthroughthehomopolymer tract. C )Exampleofstrandspecificdeletions(redcircles)observedinIonTorrentdata. Quail etal.BMCGenomics 2012, 13 :341 Page7of13 http://www.biomedcentral.com/1471-2164/13/341 notrecommendedforsequencingofextremelyAT-richgenomes,duetotheseverecoveragebiasobserved.Thisislikelytobeanartifactintroducedduringamplification.Therefore,avoidanceoflibraryamplificationand/oremPCR,oruseofmorefaithfulenzymesduringemPCR,mayeliminatethebias.Illuminasequencinghasmaturedtothepointwhereagreatmanyapplications[15-24]havebeendevelopedontheplatform.SincethePGMisalsoamassivelyparallelshort-readtechnology,manyofthoseapplica-tionsshouldtranslatewellandbeequallypracticable.Thereareafewobviousexceptions;techniquesinvolvingmanipulationsontheflowcellsuchasFRT-seq[21]andOS-Seq[22]willbeimpossibleusingsemiconductorsequencing.Also,theIonTorrentplat-formcurrentlyemploysfragmentlengthsof100or200bases;withoutamate-pairtypelibraryprotocol,theseinsertsizesaretooshortperhapstoenableaccuratedenovoassembliessuchasthatdemonstratedusingALLPATHS-LGformammaliangenomesusingIllu-minadata[25].Conversely,IlluminasequencingontheHiSeqorMiSeqinstrumentsrequiresheteroge-neousbasecompositionacrossthepopulationofimagedclusters[26].Inordertosequencemonotem-plates(wheremostsequenceablefragmentshaveexactlythesamesequence),itisoftennecessarytosignificantlydiluteormixthesamplewithacomplexgenomiclibrarytoenableregistrationofclusters. B) Ion TorrentIon Torrent, GAIIMiSeqHiSeqMiSeq_NexteraPacBio 190x Total false positives Excluding mobile elements and indels GAIIMiSeqHiSeqon MiSeqPacBio 190x All SNPs Excluding mobile elements and indelsPercentage of correctly called true SNPs Number of incorrect SNP calls Figure5AccuracyofSNPdetectionfromtheS.aureusdatasetsgeneratedfromeachplatform,comparedagainstthereferencegenomeofitscloserelativeS.aureusUSA300_FPR3757.BoththeTorrentservervariantcallingpipelineandSAMtoolswereusedforIonTorrentdata;SAMtoolswasusedforIlluminadataandSMRTportalpipelineforPacBiodata.)ThepercentageofSNPsdetectedusingeachplatformoverall(bluebar),andoutsideofrepeats,indelsandmobilegeneticelements(redbar).)ThenumberofincorrectSNPcallsforeachplatformoverall(bluebar),andoutsideofrepeats,indelsandmobilegeneticelements(redbar).etal.BMCGenomicsPage8of13http://www.biomedcentral.com/1471-2164/13/341 Semiconductorsequencingdoesnotsufferthisproblem.TheDNA-inputrequirementsofPacBiocanbepro-hibitory.IlluminaandPGMlibrarypreparationcanbeperformedwithfarlessDNA;thestandardPGMIonEXpresslibrarypreprequiresjust100ngDNAandIlluminasequencinghasbeenperformedfromsub-nanogramquantities[27].Theyield,sample-inputrequirementsandamplification-freelibraryprepofPacBiopotentiallymakeitunsuitableforcountingapplicationsandforapplicationsinvolvingsignificantpriorenrichmentsuchasexomesequencing[15]andChIP-seq[18].ThePacBioplatform,byvirtueofitslongreadlengths,shouldhoweverhaveapplicationindenovosequencingandmayalsobenefitanalysisoflinkageofalternativesplicingandinofvariantsacrosslongamplicons.Furthermore,thepotentialfordirectdetectionofepigeneticmodificationshasbeendemon-strated[28].Finally,itshouldbenotedthatthusstudyrepresentsapointintime,utilisingkitsandreagentsavailableupuntiltheendof2011.IonTorrentandPacificBios-ciencesarerelativelynewsequencingtechnologiesthathavenothadtimetomatureinthesamewaythattheIlluminatechnologyhas.Weanticipatethatwhilstsomeoftheissuesidentifiedmaybeintrinsic,otherswillberesolvedastheseplatformsevolve.Thelimitedyieldandhighcostperbasecurrentlypro-hibitlargescalesequencingprojectsonthePacificBios-ciencesinstrument.ThePGMandMiSeqarequitecloselymatchedintermsofutilityandeaseofworkflow.Thedecisiononwhethertopurchaseoneortheotherwillhingeonavailableresources,existinginfrastructureandpersonalexperience,availablefinancesandthetypeofapplicationsbeingconsidered.GenomicDNAP.falciparum3D7genomicDNAwasagiftfromProfChrisNewbold,UniversityofOxford,UK.BordetellaST24genomicDNAwasagiftfromCraigCummings,StanfordUniversitySchoolofMedicine,CA.StaphylococcusaureusTW20genomicDNAwasagiftfromJodiLindsay,StGeorgesHospitalMedicalSchool,UniversityofLondon.PullorumS449/87genomicDNAwaspreparedattheWellcomeTrustSangerInsti-tute,UK.IlluminalibraryconstructionDNA(0.5gin120lof10mMTrisHClpH8.5)wasshearedinanAFAmicrotubeusingaCovarisS2device(CovarisInc.)withthefollowingsettings:Dutycycle20,Intensity5,cycles/burst200,45seconds.ShearedDNAwaspurifiedbybindingtoanequalvol-umeofAmpurebeads(BeckmanCoulterInc.)andelutedin32lof10mMTrisHCl,pH8.5.End-repair,A-tailingandpaired-endadapterligationwereper-formed(aspertheprotocolssuppliedbyIllumina,Inc.usingreagentsfromNewEnglandBiolabs-NEB)withpurificationusinga1.5:1ratioofstandardAmpuretosamplebetweeneachenzymaticreaction.PCR-freeli-brarieswereconstructedaccordingtoKozarewaetal.[4].Afterligation,excessadaptersandadapterdimerswereremovedusingtwoAmpureclean-ups,firstwitha1.5:1ratioofstandardAmpuretosample,followedbya0.7:1ratioofAmpurebeads.PCRfreelibrarieswerethenusedasis.Librariespreparedwithamplificationweredilutedto2ng/land1lwasusedastemplateforPCRamplificationwithKapaHiFi[5]2xmastermix(KK2601,KapaBiosystems).PCRreactionswereper-formedin0.2lthin-wallmicrotubesonanMJtetradthermalcyclerwiththefollowingconditions:94°Cminutes;14cyclesof94°C20seconds,65°Cseconds,72°C30seconds;72°C-3minuteswith200nMfinalconcentrationofstandardPE1.0andmodi-fiedmultiplexingPE2.0primers[5].AfterPCR,excessprimersandanyprimerdimerwereremovedusingtwoAmpureclean-ups,witha1.5:1ratioofAmpurethenwitha0.8:1ratioofAmpurebeads.Alllibrarieswerequantifiedbyreal-timePCRusingtheSYBRFastIlluminaLibraryQuantificationKit(KapaBiosystems)andpooledsoastogiveequalgenomecoveragefromeachlibrary.IlluminasequencingEachmultiplexedlibrarypoolwassequencedon:i)anIlluminaGAIIxinstrumentfor76cyclesfromeachendplusan8base-indexsequenceread,usingversion2chemistry,ii)anIlluminaMiSeqfor151cyclesfromeachendplusan8base-indexsequenceread,iii)anIllu-minaHiSeq2000instrumentfor75cyclesfromeachendplusan8base-indexsequenceread,usingversion3chemistry.SummarysequencingstatisticsaregiveninAdditionalfile1:TableS1.Allrunshaderrorrates,andassociatedsequencequality,thatsurpassedthemini-mumIlluminaspecifications.IontorrentlibrarypreparationsequencingFortheB.pertussisS.aureusP.falciparumomes,librarypreparationwascarriedoutusingtheIonXpressFragmentLibraryKit,with100ngofDNA.Adapterligation,sizeselection,nickrepairandamplifi-cation(8cyclesforB.pertussisS.aureus,6cyclesP.falciparum)wereperformedasdescribedintheIonTorrentprotocolassociatedwiththekit(Ionetal.BMCGenomicsPage9of13http://www.biomedcentral.com/1471-2164/13/341 XpressFragmentLibraryKit-PartNumber4469142Rev.B).ForthePullorumgenome,librarypreparationwasundertakenusingtheIonFragmentLibraryKitwithgofDNA.TheDNAwasfragmentedbyadaptivefo-cusedacousticsusingaCovarisS2(CovarisInc.)withAFAtubesasdescribedintheprotocol(PartNumber4467320Rev.A).Endrepair,adapterligation,nickrepairandamplification(8cycles)werealsoperformedasdescribedintheIonTorrentprotocol.SizeselectionwasperformedusingtheLabChipXT(CaliperLifeSciences)andtheLabChipXTDNA750AssayKit(CaliperLife-Sciences),withcollectionbetween175bpand220bp.TheAgilent2100Bioanalyzer(AgilentTechnologies)andtheassociatedHighSensitivityDNAkit(AgilentTechnologies)wereusedtodeterminequalityandcon-centrationofthelibraries.TheamountoflibraryrequiredfortemplatepreparationwascalculatedusingtheTemplateDilutionFactorcalculationdescribedintheprotocol.EmulsionPCRandenrichmentstepswerecarriedoutusingtheIonXpressTemplateKitandassociatedprotocol(PartNumber4469004Rev.B).IonSpherePar-ticlequalityassessmentwascarriedoutasoutlinedinanearlierversionofthisprotocol(PartNumber4467389Rev.B)forallsamplesbecausenobenefitwasseenwithusingtheIonSphereQualityControlKitasrecom-mendedinthelaterversionoftheprotocol.TheoligosusedforthisanalysiswerepurchasedfromIDT(Inte-gratedDNATechnologiesInc.).AssessmentoftheIonSphereParticlequalitywasundertakenbetweentheemulsionPCRandenrichmentstepsonly.IontorrentsequencingSequencingwasundertakenusing316chipsinallcasesandbarcodingwasnotusedforthesesamples.TheIonSequencingKitv2.0wasusedforallsequencingreac-tions,followingtherecommendedprotocol(PartNum-ber4469714Rev.B)andTorrentSuite1.5wasusedforallanalyses.SummarysequencingstatisticsaregiveninAdditionalfile1:TableS2.PacBiolibraryconstructionDNA(2.0-10gin200l10mMTrisHClpH8.5)wasshearedinanAFAclearmini-tubeusingaCovarisS2device(CovarisInc.)withthefollowingsettings:Dutycycle20,Intensity0.1,cycles/burst1000,600seconds.ShearedDNAwaspurifiedbybindingto0.6Xvolumeofpre-washedAMPureXPbeads(BeckmanCoulterInc.),asperPacBioprotocol000-710-821-DRAFT(fivetimesinpurifiedwater,onetimeinEB,reconstitutedinori-ginalsupernatant)andelutedinEBto60ng/l.TheshearedDNAwasquantifiedonanAgilent2100Bioana-lyzerusingthe7500kit.1gofshearedDNAwasend-repairedusingthePacBioDNATemplatePrepKit1.0(PartNumber001-322-716)andincubatedfor15minat25°Cpriortoanother0.6XAMPureXPcleanup,elutingin30lEB.Bluntadapterswereligatedbeforeexonucle-aseincubationwascarriedoutinordertoremoveallun-ligatedadaptersandDNA.Finally,two0.6XAMPurebeadcleanupsareperformed-removingenzymesandadapterdimersthefinalSMRTbellsbeingelutedinlEB.FinalquantificationwascarriedoutonanAgi-lent2100Bioanalyzerwith1loflibrary.UsingtheSMRTbellconcentration(ng/l)andinsertsizepreviouslydetermined,thePacBio-providedcalcula-torwasusedtocalculatetheamountsofprimerandpolymeraseusedforthebindingreaction.UsingthePac-BioDNA/PolymeraseBindingKit1.0(PartNumber001-359-802),primersareannealedandtheproprietarypolymeraseisboundformingtheBindingComplexTheBindingComplexcanbestoredasalong-termstor-agemixat20°Cordilutedforimmediatesequencing.ThequantityofSMRTbelldetermineswhetheralong-termstoragemixcanbeused.Inthisinstance,therewasamplegenomicDNAfromthefourtestgenomestoallowlong-termstorage.PacBiosequencingLong-termstoragemixesweredilutedtotherequiredconcentrationandvolumewiththeprovideddilutionbufferandloadedinto96-wellplates.Theseareloadedontotheinstrument,alongwithDNASequencingKit1.0(PartNumber001-379-044)andaSMRTCell8Pac.Inallsequencingruns,2x45minmovieswerecapturedforeachSMRTCellloadedwithasinglebindingcom-plex.PrimaryfilteringanalysiswasperformedontheBladeCenterserverprovidedwiththeRSinstrument,beforethisdatawastransferredofftheBladeCenterforsecondaryanalysisinSMRTPortalusingtheSMRTana-lysispipelineversion1.2.0.1.81002.SummarysequencingstatisticsaregiveninAdditionalfile1:TableS3.ReferencegenomesEachreferencegenomewascreatedusingcapillaryse-quencedatawithmanualfinishingandareavailabletodownloadfromhttp://www.sanger.ac.uk/resources/downloads/.ThemethodsusedtosequencethegenomesP.falciparum[14]andS.aureusTW20[29]havebeenpublished.DataprocessingAftersequencing,readsweremappedtoeachgenomerefer-encesequenceusingthemanufacturersalignmenttools,tmapforPGMandblasrforPacBio(http://www.pacific-biosciences.com/products/software/algorithms).BWA[30]wasusedformappingreadsfromtheIlluminaGAIIx,MiSeqandHiSeq.SAMtools[31]wasthenusedtoetal.BMCGenomicsPage10of13http://www.biomedcentral.com/1471-2164/13/341 generatepileupandcoverageinformationfromthemappingoutput.GenomecoverageWecountedthenumberofbasesinthegenomethatwerenotcoveredbyanyreads(Coverage=0)andthosewithlessthan5xreadcoverage(Coverage5x).SAM-toolswasusedtogeneratecoverageplotsandbash/awkscriptswereusedforcoveragecounting.EvennessofcoveragemetricsWeextractedgenomecoverageinformationfromthepileupdataderivedbySAMtoolsfrommappedreadsafterrandomlydownsamplingtoauniformdepthof15x(thisdownsamplingwasachievedbytakingthetopthenumberofreadsrequiredtogive15xcover-ageofeachgenome).Asreadsarerandomlyallocatedevaluationofuniformityofcoveragewasbasedoncu-mulativedistributionsovertheoverallaveragedepth.GC-contentanalysisToevaluatethecoverageuniformityindifferentgenomeregions,aGCprofilewascalculatedforeachdataset.Allmappedreadswereshreddedinto50-mersandtheGC-percentageineach50-merwascalculated.Thepro-portionsof50-merscontainingagivenGC-percentagewereplottedagainsttheirGCpercentage.Atheoreticalcurveforeachgenomewasalsoproducedinthesamewayfromitsreferencesequenceforcomparison.ThedifferencefromthetheoreticalcurvegivesanindicationofGCbias.AlignmentbaseerroranalysisThealignederrorratefordatageneratedonthedifferentsequencingplatformswastakenfromthereportgener-atedbytheprogramSMALT[9],afteraligningthedatasetagainstitsreferencesequence.Theerrorrateiscalculatedastheper-baseerrorwithinamappedregiondividedbythetotalmappedbasesinthatregion.Anaverageerrorratewascalculatedfromallmappedreadsforeachdataset.Toquantifyerrorsassociatedwithspecificmotifs,wetookthefastqfileandsearchedallthereadsforthepresenceofthatmotif.Thethreebases(triplets)afterthemotifweretabulated,andthemeanqualityofthefollowingbasewascalculated.WedidthisanalysisforGGC,GCCandaneutralmotif-ATG.SNPdetectionSNPdetectionwasperformedusingarandomselectionofreadstogiveanaveragedepthofcoverageof15xforallplatforms,exceptPacBiowherethiscoveragedepthwasinsufficientandthefulldatasetrepresenting190xcoveragewasused.SNPsfromthePacBioreadswerecalledusingPacBioSMRTPortalsoftwareversion1.2.3.EachSMRTcellwasselectedforanalysisagainsttheS.aureusUSA300_FPR3757referencegenome(accessionnumberCP000255),importedintothePacBiosecondaryanalysisprotocol;theparameterscanbealteredforfiltering,mapping,andconsensuscalling.SFilter.1.xmlwasusedforfilteringwithaminimumallowedreadlengthof50basesandaminimumreadqualityof0.75(onaPacBio-developedscalespecifictoRS-generatedreads).BLASR.1.xmlwasusedformappingwiththemaximumnumberofhitsperreadbeingsetto1,amaximumdi-vergenceof30%andminimumanchorsizeof8.Finally,EviCons.1.xmlwasusedforconsensusandSNPcalling.ReadsfromtheIlluminaandIonTorrentplatformsweremappedagainsttheS.aureusUSA300_FPR3757refer-enceusingSMALT[9].SNPswerecalledusingthedefaultparametersforSAMtoolsmpileupfollowedbybcftoolsandtheSAM-toolsvcfutils.plvarFilterscript,asdescribedontheSAMtoolswebpage(http://samtools.sourceforge.net/mpileup.shtml).SNPswerealsocalledfortheIonTor-rentdatausingtheTorrentSuitevariantcallingpara-metersforSAMtoolsmpileupandbcftoolsfollowedbytheTorrentSuitevcf_filter.plscript.AsetofreferenceSNPswascreatedbyaligningthecompleteS.aureusUSA300_FPR3757genomesequencewithahigh-qualitydraftsequenceforS.aureusTW20usingMugsy[32].Asinglecontiguouswhole-genomealignmentwasgeneratedbyextractingalignedblocksfromtheMugsyoutputandthenmanuallycurating.Inordertocontrolfortheeffectsofsoftware-specificmis-mapping,weidentifiedandremovedfromouralignmentregionssequencescorrespondingtomobilegeneticele-ments(MGEs)intheS.aureusUSA300_FPR3757gen-ome,alongwithregionswithnohomologuein.MGEsweremanuallyidentifiedfromtheS.aur-USA300_FPR3757genomeannotationSNPscalledfromtheresultingalignmentprovidedahigh-qualityreferencesetforcomparisonwiththeSNPsidentifiedbyeachsequencingplatform.TrueSNPsarethosethatagreewiththeSNPsfoundinthisreferenceset.AlldatasetshavebeendepositedintheENAreadarchiveunderaccessionnumberERP001163.AdditionalfilesAdditionalfile1:TableS1.StatisticsforIlluminaSequencingRuns.TableS2.StatisticsforIonTorrentSequencingRuns.TableS3.forPacBioSequencingRuns.Additionalfile2:FigureS1.Comparisonoftheoutcomeofsequencingusinglibrariespreparedusingenzymaticshearing(greenline)andphysicalshearing(blueline)ontheIonTorrentPGM.A)ThepercentageoftheP.falciparumgenomecoveredatdifferentreaddepths;B)Thenumberofbasescoveredatdifferentdepths;C)Sequenceetal.BMCGenomicsPage11of13http://www.biomedcentral.com/1471-2164/13/341 representationversusGCcontent.FigureS2.Genomecoverageuniformityplotsfor15xdepthrandomlynormalizedsequencecoveragefromsequencinglibrariespreparedusingstandardandNexteraLibrarypreparationmethods.A)ThepercentageoftheB.pertussiscoveredatdifferentreaddepths;B)ThenumberofbasescoveredatdifferentdepthsforB.pertussis;C)ThepercentageoftheS.aureusgenomecoveredatdifferentreaddepths;D)ThenumberofbasescoveredatdifferentdepthsforS.aureus;E)Thepercentageofthegenomecoveredatdifferentreaddepths;andF)ThenumberofbasescoveredatdifferentdepthsforP.falciparumFigureS3.SequencerepresentationversusGCcontentfor15xdepthrandomlynormalizedsequencecoveragefromsequencinglibrariespreparedusingstandardandNexteraLibrarypreparationmethods.Genomecoverageuniformityplotsfor15xdepthrandomlynormalizedsequencecoveragefromsequencinglibrariespreparedusingtheIlluminaNexteraLibrarypreparationkit(blueline)comparedtothosepreparedusingastandardIlluminalibrarypreparationwithKapaHiFiforlibraryamplification(greenline),on:A)B.pertussis;B)S.aureusandC)P.falciparumSequencerepresentationversusGCcontentfor15xdepthrandomlynormalizedsequencecoveragefromthesequencingplatformstested,on:A)B.pertussis;B)andC)P.falciparumAdditionalfile3:TableS4.ComparisonofsequencecoveragefordatageneratedwithPacBio,PGMandMiSeqacrosstheP.falciparumgenome.Readsfromrandomlynormalized15xdatasetswereremappedwithSMALTtohaveauniformmappingscore.Toanalysetheutilityoflongreads,readlengthandmate-pairreadanalysiswasalsoperformedon15xdatasetscomprisingPacBioreadslongerthan620bases,andMiSeqpaired-andsingle-enddatasetswith150-base,100-baseand50-basereadlengths.Additionalfile4:TableS5.RatiosoftheoccurrenceofqualitylossafterspecificsequencetripletsfollowingtheGGCmotif.Foreachstrand,theoccurrenceandsubsequentmappingqualityistabulatedfortheGGCmotifandforcomparisonanotherGC-richmotifGCCandtheneutralmotifATG.RatiosarethengivenforthesequencequalityobservedontheforwardandreversestrandsfollowingtheGGCtripletandratiosofmappingqualityonthesamestrandfollowingGCCandATGtripletswhencomparedtotheGGCtriplet.Additionalfile5:TableS6.SNPdetectionstatisticsforS.aureusdatasetsversusS.aureusNGS:Next-generationsequencing;PGM:Personalgenomemachine;SMRT:Singlemoleculerealtime;PCR:Polymerasechainreaction;emPCR:EmulsionPCR;PE:Paired-end;qPCR:QuantitativePCR;QC:QualityControl;SNP:Singlenucleotidepolymorphism;Q10:1errorin10;Q20:1errorin100;Q30:1errorin1000.CompetinginterestsTheauthorsdeclarenocompetingfinancialinterests.MQ,MS,PCandABperformedtheexperimentsandperformedprimarydataanalysis.MQ,MS,PCandHPSdesignedtheexperiments.MQwrotethemanuscript.TDO,YGU,SHandTCcarriedoutbioinformaticsanalysis.Allauthorsreadandapprovedthefinalmanuscript.AcknowledgementsTheauthorsthanktheWellcomeTrustSangerInstitutecoresequencingandinformaticsteams.ThisworkwassupportedbytheWellcomeTrust[grantnumber098051].TDOwassupportedbytheEuropeanUnion7thframeworkReceived:16March2012Accepted:12July2012Published:24July20121.RothbergJM,HinzW,RearickTM,SchultzJ,MileskiW,DaveyM,LeamonJH,JohnsonK,MilgrewMJ,EdwardsM,etalAnintegratedsemiconductordeviceenablingnon-opticalgenomesequencing.2.EidJ,FehrA,GrayJ,LuongK,LyleJ,OttoG,PelusoP,RankD,BaybayanP,BettmanB,etalReal-timeDNAsequencingfromsinglepolymeraseScience3.BentleyDR,BalasubramanianS,SwerdlowHP,SmithGP,MiltonJ,BrownCG,HallKP,EversDJ,BarnesCL,BignellHR,etalAccuratewholehumangenomesequencingusingreversibleterminatorchemistry.4.KozarewaI,NingZ,QuailMA,SandersMJ,BerrimanM,TurnerDJ:Amplification-freeIlluminasequencing-librarypreparationfacilitatesimprovedmappingandassemblyof(G+C)-biasedgenomes.5.QuailMA,OttoTD,GuY,HarrisSR,SkellyTF,McQuillanJA,SwerdlowHP,OyolaSO:Optimalenzymesforamplifyingsequencinglibraries.6.SyedF,GrunenwaldH,CaruccioN:Next-generationsequencinglibrarypreparation:simultaneousfragmentationandtaggingusinginvitroNatureMethodsApplicationNote7.LamHYK,ClarkMJ,ChenR,ChenR,NatsoulisG,OHuallachainM,DeweyFE,HabeggerL,etalPerformancecomparisonofwhole-genomesequencingplatforms.NatBiotechnol8.CarverT,HarrisSR,BerrimanM,ParkhillJ,McQuillanJA:Artemis:Anintegratedplatformforvisualisationandanalysisofhigh-throughputsequence-basedexperimentaldata.Bioinformatics2012,469.9.PonstingN,NingZ:SMALTalignmenttool..(manuscriptinpreparation).2012.softwaredownloadhttp://www.sanger.ac.uk/resources/software/smalt/.10.OttoTD,SandersM,BerrimanM,NewboldC:IterativeCorrectionofReferenceNucleotides(iCORN)usingsecondgenerationsequencing11.NakamuraK,OshimaT,MorimotoT,IkedaS,YoshikawaH,ShiwaY,IshikawaS,LinakMC,HiraiA,TakahashiH,etalSequence-specificerrorprofileofIlluminasequencers.NucleicAcidsRes12.DiepBA,GillSR,ChangRF,PhanTH,ChenJH,DavidsonMG,LinF,LinJ,CarletonHA,MongodinEF,etalCompletegenomesequenceofUSA300,anepidemiccloneofcommunity-acquiredmeticillin-resistantStaphylococcusaureus.13.AchidiEA,etalAglobalnetworkforinvestigatingthegenomicepidemiologyofmalaria.14.GardnerMJ,HallN,FungE,WhiteO,BerrimanM,HymanRW,CarltonJM,PainA,NelsonKE,BowmanS,etalGenomesequenceofthehumanmalariaparasitePlasmodiumfalciparum.15.ChoiM,SchollUI,JiW,LiuT,TikhonovaIR,ZumboP,NayirA,BakkalogluA,OzenS,SanjadS,etalGeneticdiagnosisbywholeexomecaptureandmassivelyparallelDNAsequencing.ProcNatlAcadSciUSA16.DownTA,RakyanVK,TurnerDJ,FlicekP,LiH,KuleshaE,GrafS,JohnsonN,HerreroJ,TomazouEM,etalABayesiandeconvolutionstrategyforimmunoprecipitation-basedDNAmethylomeanalysis.NatBiotechnol17.GiresiPG,KimJ,McDaniellRM,IyerVR,LiebJD:FAIRE(Formaldehyde-AssistedIsolationofRegulatoryElements)isolatesactiveregulatoryelementsfromhumanchromatin.GenomeRes18.JohnsonDS,MortazaviA,MyersRM,WoldB:Genome-widemappingofinvivoprotein-DNAinteractions.Science19.LangridgeGC,PhanMD,TurnerDJ,PerkinsTT,PartsL,HaaseJ,CharlesI,MaskellDJ,PetersSE,DouganG,etalSimultaneousassayofeverySalmonellaTyphigeneusingonemilliontransposonmutants.20.LicatalosiDD,MeleA,FakJJ,UleJ,KayikciM,ChiSW,ClarkTA,SchweitzerAC,BlumeJE,WangX,etalHITS-CLIPyieldsgenome-wideinsightsintobrainalternativeRNAprocessing.21.MamanovaL,AndrewsRM,JamesKD,SheridanEM,EllisPD,LangfordCF,OstTW,CollinsJE,TurnerDJ:FRT-seq:amplification-free,strand-specifictranscriptomesequencing.NatMethods22.MyllykangasS,BuenrostroJD,NatsoulisG,BellJM,JiHP:Efficienttargetedresequencingofhumangermlineandcancergenomesbyoligonucleotide-selectivesequencing.NatBiotechnoletal.BMCGenomicsPage12of13http://www.biomedcentral.com/1471-2164/13/341 23.ShaoNY,HuHY,YanZ,XuY,HuH,MenzelC,LiN,ChenW,KhaitovichP: ComprehensivesurveyofhumanbrainmicroRNAbydeepsequencing. BMCGenomics 2010, 11: 409. 24.WangZ,GersteinM,SnyderM: RNA-Seq:arevolutionarytoolfor transcriptomics. NatRevGenet 2009, 10 (1):57 – 63. 25.GnerreS,MaccallumI,PrzybylskiD,RibeiroFJ,BurtonJN,WalkerBJ,SharpeT, HallG,SheaTP,SykesS, etal : High-qualitydraftassembliesofmammalian genomesfrommassivelyparallelsequencedata. ProcNatlAcadSciUSA 2011, 108 (4):1513 – 1518. 26.LevinJZ,YassourM,AdiconisX,NusbaumC,ThompsonDA,FriedmanN,Gnirke A,RegevA: Comprehensivecomparativean alysisofstrand-specificRNA sequencingmethods. NatMethods 2010, 7 (9):709 – 715. 27.AdeyA,Asan,XunX,KitzmanJO,TurnerEH,StackhouseB,MacKenzieAP, CaruccioNC,ZhangX, etal : Rapid,low-input,low-biasconstructionof shotgunfragmentlibrariesbyhigh-densityinvitrotransposition. GenomeBiol 2010, 11 (12):R119. 28.FlusbergBA,WebsterDR,LeeJH,TraversKJ,OlivaresEC,ClarkTA,KorlachJ, TurnerSW: DirectdetectionofDNAmethylationduringsingle-molecule, real-timesequencing. NatMethods 2010, 7 (6):461 – 465. 29.HoldenTG,LindsayJA,CortonC,QuailMA,CockfieldJD,PathakS,BatraR, ParkhillJ,BentleySD,EdgeworthJD: GenomeSequenceofaRecently Emerged,HighlyTransmissible,Multi-Antibiotic-andAntiseptic-Resistant VariantofMethicillin-Resistant Staphylococcusaureus ,SequenceType 239(TW). JBacteriology 2010, 192 (3):888 – 892. 30.LiH,DurbinR: FastandaccurateshortreadalignmentwithBurrows-Wheeler transform. Bioinformatics 2009, 25 (14):1754 – 1760. 31.LiH,HandsakerB,WysokerA,FennellT,RuanJ,HomerN,MarthG,Abecasis G,DurbinR: TheSequenceAlignment/MapformatandSAMtools. Bioinformatics 2009, 25 (16):2078 – 2079. 32.AngiuoliSV,SalzbergSL: Mugsy:fastmultiplealignmentofcloselyrelated wholegenomes. Bioinformatics 2011, 27 (3):334 – 342. doi:10.1186/1471-2164-13-341 Citethisarticleas: Quail etal. : Ataleofthreenextgeneration sequencingplatforms:comparisonofIonTorrent,PacificBiosciences andIlluminaMiSeqsequencers. BMCGenomics 2012 13 :341. Submit your next manuscript to BioMed Central and take full advantage of: € Convenient online submission € Thorough peer review € No space constraints or color “gure charges € Immediate publication on acceptance € Inclusion in PubMed, CAS, Scopus and Google Scholar € Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit Quail etal.BMCGenomics 2012, 13 :341 Page13of13 http://www.biomedcentral.com/1471-2164/13/341