Figure1NGSWork29owforsequinsMixtureSequinsarecombinedtogetheracrossarangeofconcentrationstoformulateamixtureMixture28leCSVisatext28lethatspeci28estheconcentrationofeachsequinwithinamixtureMixture28les ID: 863278
Download Pdf The PPT/PDF document "AnaquinVignetteTedWongtwonggarvanorgauMa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1 Anaquin-VignetteTedWong(t.wong@garvan.or
Anaquin-VignetteTedWong(t.wong@garvan.org.au)May19,2021Citation[1]RepresentinggeneticvariationwithsyntheticDNAstandards.NatureMethods,2017[2]SplicedsyntheticgenesasinternalcontrolsinRNAsequencingexperiments.NatureMethods,2016.[3]Referencestandardsfornext-generationsequencing.NatureReviews,2017.[4]Anaquin:asoftwaretoolkitfortheanalysisofspike-incontrolsfornextgenerationsequencing.Bioinformatics,2017.WebsiteVisitourwebsitetolearnmoreaboutsequins:www.sequin.xyz.OverviewInthisdocument,weshowhowtoconductstatisticalanalysisthatmodelstheperformanceofsequincontrolsinnext-generation-sequencing(NGS)experiment.WecallthesequinsRnaQuinforRNA-Seqsequins,MetaQuinformetagenomicsequins,VarQuinforgenomicsvariantsequins,andthestatisticalframeworkAnaquin.ThisvignetteiswrittenforR-usage.However,AnaquinisaframeworkcoveringtheentireNGSworkow.Consequently,theR-package(andit'sdocumentation)isasubsetoftheoverallAnaquinframework
2 .Wealsodistributeadetailedworkowgui
.Wealsodistributeadetailedworkowguideonourwebsite.ItisimportanttonoteAnaquinisbothcommand-linetoolandR-package.Ourworkowguidehasthedetailsonhowthecommand-linetoolcanbeusedwiththeR-package.SequinsNext-generationsequencing(NGS)enablesrapid,cheapandhigh-throughputdeterminationofsequenceswithinauser'ssample.NGSmethodshavebeenappliedwidely,andhavefuelledmajoradvancesinthelifesciencesandclinicalhealthcareoverthepastdecade.However,NGStypicallygeneratesalargeamountofsequencingdatathatmustberstanalyzedandinterpretedwithbioinformaticstools.ThereisnostandardwaytoperformananalysisofNGSdata;dierenttoolsprovidedierentadvantagesindierentsituations.Thecomplexityandvariationofsequencesfurthercompoundthisproblem,andthereislittlereferencebywhichcomparenext-generationsequencingandanalysis.Toaddressthisproblem,wehavedevelopedasuiteofsyntheticnucleic-acidsequins(sequencingspike-ins).Sequinsarefractionallyaddedtotheextractednucleic-acidsamplepriortolibr
3 arypreparation,sotheyaresequencedalongwi
arypreparation,sotheyaresequencedalongwithyoursampleofinterest.Wecanusethesequinsasaninternalquantitativeandqualitativecontroltoassessanystageofthenext-generationsequencingworkow.1 Figure1:NGSWorkowforsequinsMixtureSequinsarecombinedtogetheracrossarangeofconcentrationstoformulateamixture.Mixturele(CSV)isatextlethatspeciestheconcentrationofeachsequinwithinamixture.MixturelesareoftenrequiredasinputtoenableAnaquintoperformquantitativeanalysis.Mixturelecanbedownloadedfromourwebsite.Let'sdemonstrateRnaQuinmixtureAwithasimpleexample.Loadthemixturele(youcanalsodownloadtheledirectlyfromourwebsite): library('Anaquin')##Loadingrequiredpackage:ggplot2 data("RnaQuinIsoformMixture")head(RnaQuinIsoformMixture)##NameLengthMixAMixB##1R1_101_171911.3296500.472075##2R1_101_24303.7765501.416225##3R1_102_1149013.2179257.553100##4R1_102_213621.88827552.871700##5R1_103_1175460.424806453.186000##6R1_103_21856906.37209430.212400Eachrowrepr
4 esentsasequin.Namegivesthesequinnames,Le
esentsasequin.Namegivesthesequinnames,Lengthisthelengthofthesequinsinnucleotidebases,MixAgivestheconcentrationlevelinattoml/ulforMixtureA.ImaginewehavetwoRNA-Seqexperiments;awell-designedexperimentandapoorly-designedexperiment.Wewouldliketoquantifytheirisoformexpression.Let'ssimulatetheexperiments: set.seed(1234)sim11.0+1.2*log2(RnaQuinIsoformMixture$MixA)+rnorm(nrow(RnaQuinIsoformMixture),0,1)sim2c(1.0+rnorm(100,1,3),1.0+1.2*log2(tail(RnaQuinIsoformMixture,64)$MixA)+rnorm(64,0,1))Intherstexperiment,sequinsareexpectedtocorrelatelinearlywiththemeasuredFPKM.Indeed,thevariablesarestronglycorrelated: namesrow.names(RnaQuinIsoformMixture)inputlog2(RnaQuinIsoformMixture$MixA)title'Isoformexpression(Good)'xlab'Inputconcentration(log2)'2 ylab'MeasuredFPKM(log2)'plotLinear(names,input,sim1,title=title,xlab=xlab,ylab=ylab)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`
5 instead.##Warning:Useof`data$y`isdiscour
instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Inoursecondexperiment,theweaklyexpressedisoformsexhibitstochasticbehaviorandareclearlynotlinearwiththeinputconcentration.Furthermore,thereisalimitofquantication(LOQ);belowwhichaccuracyoftheexperimentbecomesquestionable. namesrow.names(RnaQuinIsoformMixture)inputlog2(RnaQuinIsoformMixture$MixA)title'Isoformexpression(Bad)'xlab'Inputconcentration(log2)'ylab'MeasuredFPKM(log2)'plotLinear(names,input,sim2,title=title,xlab=xlab,ylab=ylab)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.3 ##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Theprimaryobservationisthatthearticialscaleimposedbysequinsallowustoquantifyourexperiments.QuantifyingtranscriptomeassemblyToquantifyRNA-Seqtranscriptomeassembly,weneedtorunatranscriptomeassember;asoftwarethatcanassembletranscriptsandestimatestheirabundances.O
6 urworkowguidehasthedetails.Here,weu
urworkowguidehasthedetails.Here,weuseadatasetgeneratedbyCuinks,describedinSection5.4.5.1intheuserguide: data(UserGuideData_5.4.5.1)head(UserGuideData_5.4.5.1)##InputSn##R1_101_110.07080.990264##R1_101_25.03540.393023##R1_102_10.88860.519463##R1_102_214.21760.902349##R1_103_1107.42200.995439##R1_103_2859.37500.904095Therstcolumngivestheinputconcentrationforeachsequininattomol/ul.Thesecondcolumnisthemeasuredsensitivity.RunthefollowingR-codetogenerateasensitivityplot. title'AssemblyPlot'xlab'InputConcentration(log2)'ylab'Sensitivity'4 #Sequinnamesnamesrow.names(UserGuideData_5.4.5.1)#Inputconcentrationxlog2(UserGuideData_5.4.5.1$Input)#MeasuredsensitivityyUserGuideData_5.4.5.1$SnplotLogistic(names,x,y,title=title,xlab=xlab,ylab=ylab,showLOA=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. T
7 hettedlogisticcurverevealsclearrela
hettedlogisticcurverevealsclearrelationshipbetweeninputconcentrationandsensitivity.Unsurprisingly,theassemblerhashighersensitivitywithhighlyexpressedisoforms.Thelimit-of-assembly(LOA)isdenedastheintersectionofthecurvetosensitivityof0.70.QuantifyinggeneexpressionQuantifyinggene/isoformexpressioninvolvesbuildingalinearmodelbetweeninputconcentrationandmeasuredFPKM.Inthissection,weconsideradatasetgeneratedbyCuinks,describedinSection5.4.5.1oftheuserguide.5 Loadthedataset: data(UserGuideData_5.4.6.3)head(UserGuideData_5.4.6.3)##InputObserved1Observed2Observed3##R1_10115.10620.9588381.4566500.960190##R1_10215.10620.8065960.6045390.652783##R1_103966.79702.6504702.8905703.211090##R1_11241.69903.8760103.9199504.246390##R1_1230.21240.7791180.8986440.733175##R1_137734.38001305.7100001328.9500001358.970000Therstcolumngivesinputconcentrationforeachsequininattomol/ul.TheothercolumnsaretheFPKMvaluesforeachreplicate(threereplicatesintotal).Thefollowingcodew
8 illquantifytherstreplicate: title'G
illquantifytherstreplicate: title'GeneExpression'xlab'InputConcentration(log2)'ylab'FPKM(log2)'#Sequinnamesnamesrow.names(UserGuideData_5.4.6.3)#Inputconcentrationxlog2(UserGuideData_5.4.6.3$Input)#MeasuredFPKMylog2(UserGuideData_5.4.6.3$Observed1)plotLinear(names,x,y,title=title,xlab=xlab,ylab=ylab,showLOQ=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.6 Coecientofdeterminationisover0.90;over90%ofthevariation(e.g.technicalbias)canbeexplainedbythemodel.LOQis3.78attomol/ul,thisistheestimatedemphiricaldetectionlimit.Wecanalsoquantifymultiplereplicates: title'GeneExpression'xlab'InputConcentration(log2)'ylab'FPKM(log2)'#Sequinnamesnamesrow.names(UserGuideData_5.4.6.3)#Inputconcentrationxlog2(UserGuideData_5.4.6.3$Input)#MeasuredFPKMylog2(UserGuideData_5.4.6.3[,2:4])plotLinear(names,x,y,title=ti
9 tle,xlab=xlab,ylab=ylab,showLOQ=TRUE)##W
tle,xlab=xlab,ylab=ylab,showLOQ=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$ymin`isdiscouraged.Use`ymin`instead.7 ##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$ymax`isdiscouraged.Use`ymax`instead. DierentialanalysisInthissection,weshowhowtoquantifydierentialexpressionanalysisbetweenexpectedfold-changeandmeasuredfold-change.WeapplyourmethodtoadatasetdescribedinSection5.6.3oftheuserguide. data(UserGuideData_5.6.3)head(UserGuideData_5.6.3)##ExpLFCObsLFCSDPvalQvalMeanLabel##R1_101-3-1.8901220.7017237.069675e-032.056337e-029.953556TP##R1_102-4-2.0517770.5463741.731616e-047.646243e-0417.285262TP##R1_103-13.8377840.3776022.883289e-246.534028e-231221.301532TP##R1_11-4-2.4315820.5913523.924117e-051.97
10 4336e-0447.174250TP##R1_1211.5427570.425
4336e-0447.174250TP##R1_1211.5427570.4255622.887104e-041.214989e-0373.008720TP##R1_1300.7177010.2424933.079564e-031.000416e-0244053.259914FPForeachofthesequingene,wehaveexpectedlog-foldchange,measuredlog-foldchange,standarddeviation,p-value,q-valueandmean.TheestimationwasdonebyDESeq2.Runthefollowingcodetoconstructafoldingplot: title'GeneFoldChange'xlab'Expectedfoldchange(log2)'ylab'Measuredfoldchange(log2)'8 #Sequinnamesnamesrow.names(UserGuideData_5.6.3)#Expectedlog-foldxUserGuideData_5.6.3$ExpLFC#Measuredlog-foldyUserGuideData_5.6.3$ObsLFCplotLinear(names,x,y,title=title,xlab=xlab,ylab=ylab,showAxis=TRUE,showLOQ=FALSE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Outliersareobviousthroughoutthereferencescale.Overall,DESeq2isabletoaccountfor78%ofthevariation.WecanalsoconstructaROCplot.[1]hasdetailsonhow
11 thetrue-positivesandfalse-positivesarede
thetrue-positivesandfalse-positivesaredened. title'ROCPlot'#Sequinnamesseqsrow.names(UserGuideData_5.6.3)9 #ExpectedratioratioUserGuideData_5.6.3$ExpLFC#HowtheROCpointsareranked(scoringfunction)score1-UserGuideData_5.6.3$Pval#Classifiedlabels(TP/FP)labelUserGuideData_5.6.3$LabelplotROC(seqs,score,ratio,label,title=title,refGroup=0) AUCstatisticsforLFC3and4arehigherthanLFC1and2.Overall,allLFCratioscanbecorrectlyclassiedrelativetoLFC0.Furthermore,wecanconstructlimitofdetectionratio(LOD)curves: xlab'AverageCounts'ylab'P-value'title'LODCurves'#MeasuredmeanmeanUserGuideData_5.6.3$Mean#Expectedlog-foldratioUserGuideData_5.6.3$ExpLFC#P-valuepvalUserGuideData_5.6.3$Pval10 qvalUserGuideData_5.6.3$QvalplotLOD(mean,pval,abs(ratio),qval=qval,xlab=xlab,ylab=ylab,title=title,FDR=0.05) Unsurprisingly,p-valueisinversequadraticallyrelatedwithaveragecounts.AlltheLFCratiossystematicallyoutperformLFC0.Thefunctionalsoestimatestheempiricaldetectionlimits,[1]hasthedetails.