/
AnaquinVignetteTedWongtwonggarvanorgauMay192021Citation1361Representi AnaquinVignetteTedWongtwonggarvanorgauMay192021Citation1361Representi

AnaquinVignetteTedWongtwonggarvanorgauMay192021Citation1361Representi - PDF document

layla
layla . @layla
Follow
342 views
Uploaded On 2021-08-14

AnaquinVignetteTedWongtwonggarvanorgauMay192021Citation1361Representi - PPT Presentation

Figure1NGSWork29owforsequinsMixtureSequinsarecombinedtogetheracrossarangeofconcentrationstoformulateamixtureMixture28leCSVisatext28lethatspeci28estheconcentrationofeachsequinwithinamixtureMixture28les ID: 863278

warning data useof isdiscouraged data warning isdiscouraged useof title xlab ylab userguidedata log2 names rnaquinisoformmixture 103 102 input 101

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "AnaquinVignetteTedWongtwonggarvanorgauMa..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1 Anaquin-VignetteTedWong(t.wong@garvan.or
Anaquin-VignetteTedWong(t.wong@garvan.org.au)May19,2021Citationˆ[1]RepresentinggeneticvariationwithsyntheticDNAstandards.NatureMethods,2017ˆ[2]SplicedsyntheticgenesasinternalcontrolsinRNAsequencingexperiments.NatureMethods,2016.ˆ[3]Referencestandardsfornext-generationsequencing.NatureReviews,2017.ˆ[4]Anaquin:asoftwaretoolkitfortheanalysisofspike-incontrolsfornextgenerationsequencing.Bioinformatics,2017.WebsiteVisitourwebsitetolearnmoreaboutsequins:www.sequin.xyz.OverviewInthisdocument,weshowhowtoconductstatisticalanalysisthatmodelstheperformanceofsequincontrolsinnext-generation-sequencing(NGS)experiment.WecallthesequinsRnaQuinforRNA-Seqsequins,MetaQuinformetagenomicsequins,VarQuinforgenomicsvariantsequins,andthestatisticalframeworkAnaquin.ThisvignetteiswrittenforR-usage.However,AnaquinisaframeworkcoveringtheentireNGSworkow.Consequently,theR-package(andit'sdocumentation)isasubsetoftheoverallAnaquinframework

2 .Wealsodistributeadetailedworkowgui
.Wealsodistributeadetailedworkowguideonourwebsite.ItisimportanttonoteAnaquinisbothcommand-linetoolandR-package.Ourworkowguidehasthedetailsonhowthecommand-linetoolcanbeusedwiththeR-package.SequinsNext-generationsequencing(NGS)enablesrapid,cheapandhigh-throughputdeterminationofsequenceswithinauser'ssample.NGSmethodshavebeenappliedwidely,andhavefuelledmajoradvancesinthelifesciencesandclinicalhealthcareoverthepastdecade.However,NGStypicallygeneratesalargeamountofsequencingdatathatmustberstanalyzedandinterpretedwithbioinformaticstools.ThereisnostandardwaytoperformananalysisofNGSdata;dierenttoolsprovidedierentadvantagesindierentsituations.Thecomplexityandvariationofsequencesfurthercompoundthisproblem,andthereislittlereferencebywhichcomparenext-generationsequencingandanalysis.Toaddressthisproblem,wehavedevelopedasuiteofsyntheticnucleic-acidsequins(sequencingspike-ins).Sequinsarefractionallyaddedtotheextractednucleic-acidsamplepriortolibr

3 arypreparation,sotheyaresequencedalongwi
arypreparation,sotheyaresequencedalongwithyoursampleofinterest.Wecanusethesequinsasaninternalquantitativeandqualitativecontroltoassessanystageofthenext-generationsequencingworkow.1 Figure1:NGSWorkowforsequinsMixtureSequinsarecombinedtogetheracrossarangeofconcentrationstoformulateamixture.Mixturele(CSV)isatextlethatspeciestheconcentrationofeachsequinwithinamixture.MixturelesareoftenrequiredasinputtoenableAnaquintoperformquantitativeanalysis.Mixturelecanbedownloadedfromourwebsite.Let'sdemonstrateRnaQuinmixtureAwithasimpleexample.Loadthemixturele(youcanalsodownloadtheledirectlyfromourwebsite): library('Anaquin')##Loadingrequiredpackage:ggplot2 data("RnaQuinIsoformMixture")head(RnaQuinIsoformMixture)##NameLengthMixAMixB##1R1_101_171911.3296500.472075##2R1_101_24303.7765501.416225##3R1_102_1149013.2179257.553100##4R1_102_213621.88827552.871700##5R1_103_1175460.424806453.186000##6R1_103_21856906.37209430.212400Eachrowrepr

4 esentsasequin.Namegivesthesequinnames,Le
esentsasequin.Namegivesthesequinnames,Lengthisthelengthofthesequinsinnucleotidebases,MixAgivestheconcentrationlevelinattoml/ulforMixtureA.ImaginewehavetwoRNA-Seqexperiments;awell-designedexperimentandapoorly-designedexperiment.Wewouldliketoquantifytheirisoformexpression.Let'ssimulatetheexperiments: set.seed(1234)sim11.0+1.2*log2(RnaQuinIsoformMixture$MixA)+rnorm(nrow(RnaQuinIsoformMixture),0,1)sim2c(1.0+rnorm(100,1,3),1.0+1.2*log2(tail(RnaQuinIsoformMixture,64)$MixA)+rnorm(64,0,1))Intherstexperiment,sequinsareexpectedtocorrelatelinearlywiththemeasuredFPKM.Indeed,thevariablesarestronglycorrelated: namesrow.names(RnaQuinIsoformMixture)inputlog2(RnaQuinIsoformMixture$MixA)title'Isoformexpression(Good)'xlab'Inputconcentration(log2)'2 ylab'MeasuredFPKM(log2)'plotLinear(names,input,sim1,title=title,xlab=xlab,ylab=ylab)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`

5 instead.##Warning:Useof`data$y`isdiscour
instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Inoursecondexperiment,theweaklyexpressedisoformsexhibitstochasticbehaviorandareclearlynotlinearwiththeinputconcentration.Furthermore,thereisalimitofquantication(LOQ);belowwhichaccuracyoftheexperimentbecomesquestionable. namesrow.names(RnaQuinIsoformMixture)inputlog2(RnaQuinIsoformMixture$MixA)title'Isoformexpression(Bad)'xlab'Inputconcentration(log2)'ylab'MeasuredFPKM(log2)'plotLinear(names,input,sim2,title=title,xlab=xlab,ylab=ylab)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.3 ##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Theprimaryobservationisthatthearticialscaleimposedbysequinsallowustoquantifyourexperiments.QuantifyingtranscriptomeassemblyToquantifyRNA-Seqtranscriptomeassembly,weneedtorunatranscriptomeassember;asoftwarethatcanassembletranscriptsandestimatestheirabundances.O

6 urworkowguidehasthedetails.Here,weu
urworkowguidehasthedetails.Here,weuseadatasetgeneratedbyCuinks,describedinSection5.4.5.1intheuserguide: data(UserGuideData_5.4.5.1)head(UserGuideData_5.4.5.1)##InputSn##R1_101_110.07080.990264##R1_101_25.03540.393023##R1_102_10.88860.519463##R1_102_214.21760.902349##R1_103_1107.42200.995439##R1_103_2859.37500.904095Therstcolumngivestheinputconcentrationforeachsequininattomol/ul.Thesecondcolumnisthemeasuredsensitivity.RunthefollowingR-codetogenerateasensitivityplot. title'AssemblyPlot'xlab'InputConcentration(log2)'ylab'Sensitivity'4 #Sequinnamesnamesrow.names(UserGuideData_5.4.5.1)#Inputconcentrationxlog2(UserGuideData_5.4.5.1$Input)#MeasuredsensitivityyUserGuideData_5.4.5.1$SnplotLogistic(names,x,y,title=title,xlab=xlab,ylab=ylab,showLOA=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. T

7 hettedlogisticcurverevealsclearrela
hettedlogisticcurverevealsclearrelationshipbetweeninputconcentrationandsensitivity.Unsurprisingly,theassemblerhashighersensitivitywithhighlyexpressedisoforms.Thelimit-of-assembly(LOA)isdenedastheintersectionofthecurvetosensitivityof0.70.QuantifyinggeneexpressionQuantifyinggene/isoformexpressioninvolvesbuildingalinearmodelbetweeninputconcentrationandmeasuredFPKM.Inthissection,weconsideradatasetgeneratedbyCuinks,describedinSection5.4.5.1oftheuserguide.5 Loadthedataset: data(UserGuideData_5.4.6.3)head(UserGuideData_5.4.6.3)##InputObserved1Observed2Observed3##R1_10115.10620.9588381.4566500.960190##R1_10215.10620.8065960.6045390.652783##R1_103966.79702.6504702.8905703.211090##R1_11241.69903.8760103.9199504.246390##R1_1230.21240.7791180.8986440.733175##R1_137734.38001305.7100001328.9500001358.970000Therstcolumngivesinputconcentrationforeachsequininattomol/ul.TheothercolumnsaretheFPKMvaluesforeachreplicate(threereplicatesintotal).Thefollowingcodew

8 illquantifytherstreplicate: title'G
illquantifytherstreplicate: title'GeneExpression'xlab'InputConcentration(log2)'ylab'FPKM(log2)'#Sequinnamesnamesrow.names(UserGuideData_5.4.6.3)#Inputconcentrationxlog2(UserGuideData_5.4.6.3$Input)#MeasuredFPKMylog2(UserGuideData_5.4.6.3$Observed1)plotLinear(names,x,y,title=title,xlab=xlab,ylab=ylab,showLOQ=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.6 Coecientofdeterminationisover0.90;over90%ofthevariation(e.g.technicalbias)canbeexplainedbythemodel.LOQis3.78attomol/ul,thisistheestimatedemphiricaldetectionlimit.Wecanalsoquantifymultiplereplicates: title'GeneExpression'xlab'InputConcentration(log2)'ylab'FPKM(log2)'#Sequinnamesnamesrow.names(UserGuideData_5.4.6.3)#Inputconcentrationxlog2(UserGuideData_5.4.6.3$Input)#MeasuredFPKMylog2(UserGuideData_5.4.6.3[,2:4])plotLinear(names,x,y,title=ti

9 tle,xlab=xlab,ylab=ylab,showLOQ=TRUE)##W
tle,xlab=xlab,ylab=ylab,showLOQ=TRUE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$ymin`isdiscouraged.Use`ymin`instead.7 ##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$ymax`isdiscouraged.Use`ymax`instead. DierentialanalysisInthissection,weshowhowtoquantifydierentialexpressionanalysisbetweenexpectedfold-changeandmeasuredfold-change.WeapplyourmethodtoadatasetdescribedinSection5.6.3oftheuserguide. data(UserGuideData_5.6.3)head(UserGuideData_5.6.3)##ExpLFCObsLFCSDPvalQvalMeanLabel##R1_101-3-1.8901220.7017237.069675e-032.056337e-029.953556TP##R1_102-4-2.0517770.5463741.731616e-047.646243e-0417.285262TP##R1_103-13.8377840.3776022.883289e-246.534028e-231221.301532TP##R1_11-4-2.4315820.5913523.924117e-051.97

10 4336e-0447.174250TP##R1_1211.5427570.425
4336e-0447.174250TP##R1_1211.5427570.4255622.887104e-041.214989e-0373.008720TP##R1_1300.7177010.2424933.079564e-031.000416e-0244053.259914FPForeachofthesequingene,wehaveexpectedlog-foldchange,measuredlog-foldchange,standarddeviation,p-value,q-valueandmean.TheestimationwasdonebyDESeq2.Runthefollowingcodetoconstructafoldingplot: title'GeneFoldChange'xlab'Expectedfoldchange(log2)'ylab'Measuredfoldchange(log2)'8 #Sequinnamesnamesrow.names(UserGuideData_5.6.3)#Expectedlog-foldxUserGuideData_5.6.3$ExpLFC#Measuredlog-foldyUserGuideData_5.6.3$ObsLFCplotLinear(names,x,y,title=title,xlab=xlab,ylab=ylab,showAxis=TRUE,showLOQ=FALSE)##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead.##Warning:Useof`data$x`isdiscouraged.Use`x`instead.##Warning:Useof`data$y`isdiscouraged.Use`y`instead. Outliersareobviousthroughoutthereferencescale.Overall,DESeq2isabletoaccountfor78%ofthevariation.WecanalsoconstructaROCplot.[1]hasdetailsonhow

11 thetrue-positivesandfalse-positivesarede
thetrue-positivesandfalse-positivesaredened. title'ROCPlot'#Sequinnamesseqsrow.names(UserGuideData_5.6.3)9 #ExpectedratioratioUserGuideData_5.6.3$ExpLFC#HowtheROCpointsareranked(scoringfunction)score1-UserGuideData_5.6.3$Pval#Classifiedlabels(TP/FP)labelUserGuideData_5.6.3$LabelplotROC(seqs,score,ratio,label,title=title,refGroup=0) AUCstatisticsforLFC3and4arehigherthanLFC1and2.Overall,allLFCratioscanbecorrectlyclassiedrelativetoLFC0.Furthermore,wecanconstructlimitofdetectionratio(LOD)curves: xlab'AverageCounts'ylab'P-value'title'LODCurves'#MeasuredmeanmeanUserGuideData_5.6.3$Mean#Expectedlog-foldratioUserGuideData_5.6.3$ExpLFC#P-valuepvalUserGuideData_5.6.3$Pval10 qvalUserGuideData_5.6.3$QvalplotLOD(mean,pval,abs(ratio),qval=qval,xlab=xlab,ylab=ylab,title=title,FDR=0.05) Unsurprisingly,p-valueisinversequadraticallyrelatedwithaveragecounts.AlltheLFCratiossystematicallyoutperformLFC0.Thefunctionalsoestimatestheempiricaldetectionlimits,[1]hasthedetails.

Related Contents


Next Show more