ThemonoclepackagetakesamatrixofexpressionvalueswhicharetypicallyforgenesasopposedtosplicevariantsascalculatedbyCuinks3oranothergeneexpressionestimationprogramMonocleassumesthatgeneexpressionva ID: 124547
Download Pdf The PPT/PDF document "Ascellsdierentiate,theyundergoaprocesso..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Ascellsdierentiate,theyundergoaprocessoftranscriptionalre-conguration,withsomegenesbeingsilencedandothersnewlyactivated.Whilemanystudieshavecomparedcellsatdierentstagesofdierentiation,examiningintermediatestateshasprovendicult,fortworeasons.First,itisoftennotclearfromcellularmorphologyorestablishedmarkerswhatintermediatestatesexistbetween,forexample,aprecursorcelltypeanditsterminallydierentiatedprogeny.Moreover,twocellsmighttransitthroughadierentsequenceofintermediatestagesandultimatelyconvergeonthesameendstate.Second,evencellsinageneticallyandepigeneticallyclonalpopulationmightprogressthroughdierentiationatdierentratesinvitro,dependingonpositioningandlevelofcontactswithneighboringcells.Lookingataveragebehaviorinagroupofcellsisthusnotnecessarilyfaithfultotheprocessthroughwhichanindividualcelltransits.Monoclecomputationallyreconstructsthetranscriptionaltransitionsundergonebydierentiatingcells.Itordersamixed,unsynchronizedpopulationofcellsaccordingtoprogressthroughthelearnedprocessofdierentiation.Becausethepopulationmayactuallydierentiateintomultipleseparatelineages,Monocleallowstheprocesstobranch,andcanassigneachcelltothecorrectsub-lineage.Itsubsequentlyidentiesgeneswhichdistinguishdierentstates,andgenesthataredierentiallyregulatedthroughtime.Finally,itperformsclusteringonallgenes,toclassifythemaccordingtokinetictrends.ThealgorithmisinspiredbyandandextendsoneproposedbyMagweneetaltotime-ordermicroarraysamples[2].Monoclediersfrompreviousworkinthreeways.First,single-cellRNA-Seqdatadierfrommicroarraymeasurementsinmanyways,andsoMonoclemusttakespecialcaretomodelthemappropriatelyatseveralstepsinthealgorithm.Secondly,theearlieralgorithmassumesthatsamplesprogressalongasingletrajectorythroughexpressionspace.However,duringcelldierentiation,multiplelineagesmightarisefromasingleprogenitor.Monoclecanndtheselineagebranchesandcorrectlyplacecellsuponthem.Finally,Monoclealsoperformsdierentialexpressionanalysisandclusteringontheorderedcellstohelpauseridentifykeyeventsinthebiologicalprocessofinterest.2Single-cellexpressiondatainMonocle Themonoclepackagetakesamatrixofexpressionvalues,whicharetypicallyforgenes(asopposedtosplicevariants),ascalculatedbyCuinks[3]oranothergeneexpressionestimationprogram.Monocleassumesthatgeneexpressionvaluesarelog-normallydistributed,asistypicalinRNA-Seqexperiments.Monocledoesnotnormalizetheseexpressionvaluestocontrolforlibrarysize,depthofsequencing,orothersourcesoftechnicalvariability-whicheverprogramthatyouusetocalculateexpressionvaluesshoulddothat.Monocleisnotmeanttobeusedwithrawcounts,anddoingsocouldproducenonsenseresults.2.1TheCellDataSetclassmonocleholdssinglecellexpressiondatainobjectsoftheCellDataSetclass.TheclassisderivedfromtheBioconductorExpressionSetclass,whichprovidesacommoninterfacefamiliartothosewhohaveanalyzedmicroarrayexperimentswithBioconductor.Theclassrequiresthreeinputles:1.exprs,anumericmatrixofexpressionvalues,whererowsaregenes,andcolumnsarecells2.phenoData,anAnnotatedDataFrameobject,whererowsarecells,andcolumnsarecellattributes(suchascelltype,culturecondition,daycaptured,etc.)3.featureData,anAnnotatedDataFrameobject,whererowsarefeatures(e.g.genes),andcolumnsaregeneattributes,suchasbiotype,gccontent,etc.TheexpressionvaluematrixmusthavethesamenumberofcolumnsasthephenoDatahasrows,anditmusthavethesamenumberofrowsasthefeatureDatadataframehasrows.RownamesofthephenoDataobjectshouldmatchthecolumnnamesoftheexpressionmatrix.RownamesofthefeatureDataobjectshouldmatchrownamesoftheexpressionmatrix.YoucancreateanewCellDataSetobjectasfollows: #notrunfpkm_matrix-read.table("fpkm_matrix.txt")sample_sheet-read.delim("cell_sample_sheet.txt")gene_ann-read.delim("gene_annotations.txt")pd-new("AnnotatedDataFrame",data=sample_sheet)fd-new("AnnotatedDataFrame",data=gene_ann)HSMM-new("CellDataSet",exprs=as.matrix(fpkm_matrix),phenoData=pd,featureData=fd)Itisoftenconvenienttoknowhowmanyexpressaparticulargene,orhowmanygenesareexpressedbyagivencell.Monocleprovidesasimplefunctiontocomputethosestatistics:Monocle:Dierentialexpressionandtime-seriesanalysisforsingle-cellRNA-SeqandqPCRexperiments ##T0_CT_A081472238NANANA##num_genes_expressedPseudotimeState##T0_CT_A0197707.2001##T0_CT_A0391802.7161##T0_CT_A0585282.2721##T0_CT_A0670966.4611##T0_CT_A0775903.4021##T0_CT_A08770220.3002Thisdatasethasalreadybeenlteredusingthefollowingcommands: valid_cells-row.names(subset(pData(HSMM),Cells.in.Well==1&Control==FALSE&Clump==FALSE&Debris==FALSE&Mapped.Fragmentsက1e+06))HSMM-HSMM[,valid_cells]Onceyou'veexcludedcellsthatdonotpassyourqualitycontrollters,youshouldverifythattheexpressionvaluesstoredinyourCellDataSetfollowadistributionthatisroughlylognormal: #Log-transformeachvalueintheexpressionmatrix.L-log(exprs(HSMM[expressed_genes,]))#Standardizeeachgene,sothattheyareallonthesamescale,Thenmelt#thedatawithplyrsowecanplotiteasily'melted_dens_df-melt(t(scale(t(L))))#Plotthedistributionofthestandardizedgeneexpressionvalues.qplot(value,geom="density",data=melted_dens_df)+stat_function(fun=dnorm,size=0.5,color="red")+xlab("Standardizedlog(FPKM)")+ylab("Density")##Warning:Removed2854443rowscontainingnon-finitevalues(stat_density). 4Basicdierentialexpressionanalysis DierentialgeneexpressionanalysisisacommontaskinRNA-Seqexperiments.Monoclecanhelpyoundgenesthataredierentiallyexpressedbetweengroupsofcellsandassessesthestatisticalsigncanceofthosechanges.Thesecomparisonsrequirethatyouhaveawaytocollectyourcellsintotwoormoregroups.ThesegroupsaredenedbycolumnsinthephenoDatatableofeachCellDataSet.Monoclewillassessthesigncanceofeachgene'sexpressionlevelacrossthedierentgroupsofcells.Performingdierentialexpressionanalysisonallgenesinthehumangenomecantakeasubstantialamountoftime.Foradatasetaslargeasthemyoblastdatafrom[1],whichcontainsseveralhundredcells,theanalysiscantakeseveralhoursonasingleCPU.Let'sselectasmallsetofgenesthatweknowareimportantinmyogenesistodemonstrateMonocle'scapabilities: marker_genes-row.names(subset(fData(HSMM),gene_short_name%in%c("MEF2C","MEF2D","MYF5","ANPEP","PDGFRA","MYOG","TPM1","TPM2","MYH2","MYH3","NCAM1","TNNT1","TNNT2","TNNC1","CDK1","CDK2","CCNB1","CCNB2","CCND1","CCNA1","ID1")))Monocle:Dierentialexpressionandtime-seriesanalysisforsingle-cellRNA-SeqandqPCRexperiments plot_spanning_tree(HSMM) HSMM_filtered-HSMM[expressed_genes,pData(HSMM)$State!=3]my_genes-row.names(subset(fData(HSMM_filtered),gene_short_name%in%c("CDK1","MEF2C","MYH3")))cds_subset-HSMM_filtered[my_genes,]plot_genes_in_pseudotime(cds_subset,color_by="Time") Monocle:Dierentialexpressionandtime-seriesanalysisforsingle-cellRNA-SeqandqPCRexperiments