/
Gene set enrichment analysis with topGO Adrian Alexa J Gene set enrichment analysis with topGO Adrian Alexa J

Gene set enrichment analysis with topGO Adrian Alexa J - PDF document

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
426 views
Uploaded On 2015-06-10

Gene set enrichment analysis with topGO Adrian Alexa J - PPT Presentation

mpisbmpgde alexa Contents 1 Introduction 2 Instalation 3 Quick start guide 31 Data preparation 3 32 Performing the enrichment tests 4 33 Analysis of resul ID: 83632

mpisbmpgde alexa Contents Introduction

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Gene set enrichment analysis with topGO ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

GenesetenrichmentanalysiswithtopGOAdrianAlexa,JorgRahnenfuhrerOctober30,2020http://www.mpi-sb.mpg.de/alexaContents1Introduction22Instalation23Quickstartguide33.1Datapreparation...........................................33.2Performingtheenrichmenttests..................................43.3Analysisofresults..........................................54Loadinggenesandannotationsdata84.1Gettingstarted............................................84.2ThetopGOdataobject........................................84.3Customannotations.........................................94.4Prede nedlistofinterestinggenes.................................104.5Usingthegenesscore.........................................114.6FilteringandmissingGOannotations...............................125WorkingwiththetopGOdataobject146Runningtheenrichmenttests166.1De ningandrunningthetest....................................176.2Theadjustmentofp-values.....................................196.3Addinganewtest..........................................196.4runTest:ahigh-levelinterfacefortesting.............................197Interpretationandvisualizationofresults207.1ThetopGOresultobject.......................................207.2Summarisingtheresults.......................................217.3AnalysingindividualGOs......................................227.4VisualisingtheGOstructure....................................238SessionInformation26References261 1IntroductionThetopGOpackageisdesignedtofacilitatesemi-automatedenrichmentanalysisforGeneOntology(GO)terms.Theprocessconsistsofinputofnormalisedgeneexpressionmeasurements,gene-wisecorrelationordi erentialexpressionanalysis,enrichmentanalysisofGOterms,interpretationandvisualisationoftheresults.OneofthemainadvantagesoftopGOistheuni edgenesettestingframeworkito ers.BesidesprovidinganeasytousesetoffunctionsforperformingGOenrichmentanalysis,italsoenablestheusertoeasilyimplementnewstatisticaltestsornewalgorithmsthatdealwiththeGOgraphstructure.Thisuni edframeworkalsofacilitatesthecomparisonbetweendi erentGOenrichmentmethodologies.ThereareanumberofteststatisticsandalgorithmsdealingwiththeGOgraphstructuredreadytouseintopGO.Table1presentsthecompatibilitytablebetweentheteststatisticsandGOgraphmethods. sherkstglobaltestsum classic elim weight weight01 lea parentchild Table1:AlgorithmscurrentlysupportedbytopGO.TheelimandweightalgorithmswereintroducedinAlexaetal.(2006).ThedefaultalgorithmusedbythetopGOpackageisamixturebetweentheelimandtheweightalgorithmsanditwillbereferredasweight01.TheparentChildalgorithmwasintroducedbyGrossmannetal.(2007).WeassumetheuserhasagoodunderstandingofGO,seeConsortium(2001),andisfamiliarwithgenesetenrichmenttests.AlsothisdocumentrequiresbasicknowledgeofRlanguage.ThenextsectionpresentsaquicktourintotopGOandisthoughttobeindependentoftherestofthismanuscript.TheremainingsectionsprovidedetailsonthefunctionsusedinthesamplesectionaswellasshowingmoreadvancefunctionalityimplementedinthetopGOpackage.2InstalationThissectionbrie ydescribethenecessarytogettopGOrunningonyoursystem.WeassumethattheuserhavetheRprogram(seetheRprojectathttp://www.r-project.org)alreadyinstalledanditsfamiliarwithit.YouwillneedtohaveR2.10.0orlatertobeabletoinstallandruntopGO.ThetopGOpackageisavailablefromtheBioconductorrepositoryathttp://www.bioconductor.orgTobeabletoinstallthepackageoneneeds rsttoinstallthecoreBioconductorpackages.IfyouhavealreadyinstalledBioconductorpackagesonyoursystemthenyoucanskipthetwolinesbelow.�if(!requireNamespace("BiocManager",quietly=TRUE))+install.packages("BiocManager")�BiocManager::install()OncethecoreBioconductorpackagesareinstalled,wecaninstallthetopGOpackageby�if(!requireNamespace("BiocManager",quietly=TRUE))+install.packages("BiocManager")�BiocManager::install("topGO") 3QuickstartguideThissectiondescribesasimpleworkingsessionusingtopGO.Thereareonlyahandfulofcommandsnecessarytoperformagenesetenrichmentanalysiswhichwillbebrie ypresentedbelow.Atypicalsessioncanbedividedintothreesteps:1.Datapreparation:Listofgenesidenti ers,genescores,listofdi erentiallyexpressedgenesoracriteriaforselectinggenesbasedontheirscores,aswellasgene-to-GOannotationsareallcollectedandstoredinasingleRobject.2.Runningtheenrichmenttests:Usingtheobjectcreatedinthe rststeptheusercanperformenrichmentanalysisusinganyfeasiblemixtureofstatisticaltestsandmethodsthatdealwiththeGOtopology.3.Analysisoftheresults:Theresultsobtainedinthesecondstepareanalysedusingsummaryfunctionsandvisualisationtools.Beforegoingthrougheachofthosestepstheuserneedstodecidewhichbiologicalquestionhewouldliketoinvestigate.Theaimofthestudy,aswellasthenatureoftheavailabledata,willdictatewhichteststatistic/methodsneedtoused.InthissectionwewilltesttheenrichmentofGOtermswithdi erentiallyexpressedgenesusingtwostatisticaltests,namelyKolmogorov-SmirnovtestandFisher'sexacttest.3.1DatapreparationInthe rststepaconvenientRobjectofclasstopGOdataiscreatedcontainingalltheinformationrequiredfortheremainingtwosteps.Theuserneedstoprovidethegeneuniverse,GOannotationsandeitheracriteriaforselectinginterestinggenes(e.g.di erentiallyexpressedgenes)fromthegeneuniverseorascoreassociatedwitheachgene.InthissessionwewilltesttheenrichmentofGOtermswithdi erentiallyexpressedgenes.Thus,thestartingpointisalistofgenesandtherespectivep-valuesfordi erentialexpression.Atoyexampleofalistofgenep-valuesisprovidedbythegeneListobject.�library(topGO)�library(ALL)�data(ALL)�data(geneList)ThegeneListdataisbasedonadi erentialexpressionanalysisoftheALL(AcuteLymphoblasticLeukemia)datasetthatwasextensivelystudiedintheliteratureonmicroarrayanalysisChiaretti,S.,etal.(2004).Ourtoyexamplecontainsjustasmallamount,323,ofgenesandtheircorrespondingp-values.Thenextdataoneneedsarethegenegroupsitself,theGOtermsinourcase,andthemappingthatassociateeachgenewithoneormoreGOterm(s).Theinformationonwhereto ndtheGOannotationsisstoredintheALLobjectanditiseasilyaccessible.�affyLib-paste(annotation(ALL),"db",sep=".")ကlibrary(package=affyLib,character.only=TRUE)Themicroarrayusedintheexperimentisthehgu95av2fromA ymetrix,aswecanseefromtheaffyLibobject.WhenweloadedthegeneListobjectaselectionfunctionusedforde ningthelistofdi erentiallyexpressedgenesisalsoloadedunderthenameoftopDiffGenes.Thefunctionassumesthattheprovidedargumentisanamedvectorofp-values.Withthehelpofthisfunctionwecanseethatthereare50geneswitharawp-valuelessthan0:01outofatotalof323genes.ကsum(topDiffGenes(geneList))[1]50 WenowhavealldatanecessarytobuildanobjectoftypetopGOdata.Thisobjectwillcontainallgeneidenti ersandtheirscores,theGOannotations,theGOhierarchicalstructureandallotherinformationneededtoperformthedesiredenrichmentanalysis.�sampleGOdata-new("topGOdata",+description="Simplesession",ontology="BP",+allGenes=geneList,geneSel=topDiffGenes,+nodeSize=10,+annot=annFUN.db,affyLib=affyLib)ThenamesoftheargumentsusedforbuildingthetopGOdataobjectshouldbeself-explanatory.WequicklymentionthatnodeSize=10isusedtoprunetheGOhierarchyfromthetermswhichhavelessthan10annotatedgenesandthatannFUN.dbfunctionisusedtoextractthegene-to-GOmappingsfromtheaffyLibobject.Section4.2describestheparametersusedtobuildthetopGOdataindetails.AsummaryofthesampleGOdataobjectcanbeseenbytypingtheobjectnameattheRprompt.Havingallthedatastoredintothisobjectfacilitatestheaccesstoidenti ers,annotationsandtobasicdatastatistics.ကsampleGOdata3.2PerformingtheenrichmenttestsOncewehaveanobjectofclasstopGOdatawecanstartwiththeenrichmentanalysis.Wewillusetwotypesofteststatistics:Fisher'sexacttestwhichisbasedongenecounts,andaKolmogorov-Smirnovliketestwhichcomputesenrichmentbasedongenescores.Wecanuseboththesetestssinceeachgenehasascore(representinghowdi erentiallyexpressedageneis)andbythemeansoftopDiffGenesfunctionsthegenesarecategorizedintodi erentiallyexpressedornotdi erentiallyexpressedgenes.AllthesearestoredintosampleGOdataobject.ThefunctionrunTestisusedtoapplythespeci edteststatisticandmethodtothedata.Ithasthreemainarguments.The rstargumentneedstobeanobjectofclasstopGOdata.Thesecondandthirdargumentareoftypecharacter;theyspecifythemethodfordealingwiththeGOgraphstructureandtheteststatistic,respectively.First,weperformaclassicalenrichmentanalysisbytestingtheover-representationofGOtermswithinthegroupofdi erentiallyexpressedgenes.ForthemethodclassiceachGOcategoryistestedindependently.ကresultFisher-runTest(sampleGOdata,algorithm="classic",statistic="fisher")runTestreturnsanobjectofclasstopGOresult.Ashortsummaryofthisobjectisshownbelow.ကresultFisherDescription:SimplesessionOntology:BP'classic'algorithmwiththe'fisher'test1093GOtermsscored:54termswithp0.01Annotationdata:Annotatedgenes:310Significantgenes:46Min.no.ofgenesannotatedtoaGO:10Nontrivialnodes:998NextwewilltesttheenrichmentusingtheKolmogorov-Smirnovtest.Wewillusetheboththeclassicandtheelimmethod.&#x-523;resultKS-runTest(sampleGOdata,algorithm="classic",statistic="ks")ကresultKS.elim-runTest(sampleGOdata,algorithm="elim",statistic="ks") GO.IDTermAnnotatedSigni cantExpectedRankinclassicFisherclassicFisherclassicKSelimKS 1GO:0051301celldivision1461621.669770.976577.8e-082.4e-072GO:0031668cellularresponsetoextracellularstimu...1181.6311.6e-054.0e-054.0e-053GO:0090068positiveregulationofcellcycleproces...3955.796510.722890.000180.000184GO:0010389regulationofG2/Mtransitionofmitotic...3074.452860.135350.000190.000195GO:0140014mitoticnucleardivision94713.959910.996800.001180.001186GO:0051726regulationofcellcycle1321819.596580.748808.4e-050.001527GO:0045931positiveregulationofmitoticcellcycl...3745.497990.836610.001710.001718GO:0050851antigenreceptor-mediatedsignalingpath...1171.6370.000210.002080.002089GO:1900221regulationofamyloid-betaclearance1051.48430.008270.002870.0028710GO:0031667responsetonutrientlevels1371.93200.000880.002920.00292 Table2:Signi canceofGOtermsaccordingtoclassicandelimmethods.Pleasenotethatnotallstatisticaltestsworkwitheverymethod.ThecompatibilitymatrixbetweenthemethodsandstatisticaltestsisshowninTable1.Thep-valuescomputedbytherunTestfunctionareunadjustedformultipletesting.Wedonotadvocateagainstadjustingthep-valuesofthetestedgroups,howeverinmanycasesadjustedp-valuesmightbemisleading.3.3AnalysisofresultsAftertheenrichmenttestsareperformedtheresearcherneedstoolsforanalysingandinterpretingtheresults.GenTableisaneasytousefunctionforanalysingthemostsigni cantGOtermsandthecorrespondingp-values.Inthefollowingexample,welistthetop10signi cantGOtermsidenti edbytheelimmethod.Atthesametimewealsocomparetheranksandthep-valuesoftheseGOtermswiththeonesobtainedbytheclassicmethod.�allRes-GenTable(sampleGOdata,classicFisher=resultFisher,+classicKS=resultKS,elimKS=resultKS.elim,+orderBy="elimKS",ranksOf="classicFisher",topNodes=10)TheGenTablefunctionreturnsadataframecontainingthetoptopNodesGOtermsidenti edbytheelimalgorithm,seeorderByargument.ThedataframeincludessomestatisticsontheGOtermsandthep-valuescorrespondingtoeachofthetopGOresultobjectspeci edasarguments.Table2showstheresults.ForaccessingtheGOterm'sp-valuesfromatopGOresultobjecttheusershouldusethescorefunctions.Asasimpleexample,welookatthedi erencesbetweentheresultsoftheclassicandtheelimmethodsinthecaseoftheKolmogorov-Smirnovtest.Theelimmethodwasdesigntobemoreconservativethentheclassicmethodandthereforeoneexpectsthep-valuesreturnedbytheformermethodarelowerboundedbythep-valuesreturnedbythelatermethod.Theeasiestwaytovisualizethispropertyistoscatterplotthetwosetsofp-valuesagainsteachother.ကpValue.classic-score(resultKS)ကpValue.elim-score(resultKS.elim)[names(pValue.classic)]ကgstat-termStat(sampleGOdata,names(pValue.classic))ကgSize-gstat$Annotated/max(gstat$Annotated)*4ကgCol-colMap(gstat$Significant)ကplot(pValue.classic,pValue.elim,xlab="p-valueclassic",ylab="p-valueelim",+pch=19,cex=gSize,col=gCol)WecanseeinFigure1thatthereareindeeddi erencesbetweenthetwomethods.SomeGOtermsfoundsigni cantbytheclassicmethodarelesssigni cantintheelim,asexpected.However,insomecases,wecanvisibleidentifyafewGOtermsforwhichtheelimp-valueislessconservativethentheclassicp-value.IfsuchGOtermsexist,wecanidentifythemand ndthenumberofannotatedgenes:ကsel.go-names(pValue.classic)[pValue.elimpValue.classic]&#x-523;cbind(termStat(sampleGOdata,sel.go),+elim=pValue.elim[sel.go],+classic=pValue.classic[sel.go]) Figure1:p-valuesscatterplotfortheclassic(xaxis)andelim(yaxis)methods.Ontherightpanelthep-valuesareplottedonalinearscale.Theleftpannedplotsthesamep-valuesonalogarithmicscale.ThesizeofthedotisproportionalwiththenumberofannotatedgenesfortherespectiveGOtermanditscoloringrepresentsthenumberofsigni cantlydi erentiallyexpressedgenes,withthedarkredpointshavingmoregenesthentheyellowones.AnnotatedSignificantExpectedelimclassicGO:00081522534037.540.372688250.43290774GO:00090581753025.970.123740370.20607285GO:00090591212417.950.096318260.22106711GO:00098891051815.580.270043550.33891272GO:00192221712725.370.363613610.48920146GO:00313231622524.040.383081420.43204977GO:00346451172117.360.205798100.31818382GO:00442372463936.500.318906560.38029888GO:00442491712725.370.252852570.37807808GO:0050793841912.460.039153930.05828868GO:0051094691610.240.055902690.06214595GO:00717042524037.390.364694360.42515582GO:19015761753025.970.123740370.20607285Itisquiteinterestingthatsuchcasesappear-theabovetablewillnotbeempty.TheseGOtermsarerathergeneral(havingmanyannotatedgenes)andtheirp-valuesarenotsigni cantatthe0:05level.ThereforetheseGOtermswouldnotappearinthelistoftopsigni cantterms.Moresigni cantGOtermsarelesslikelytobein uencedbythisnonmonotonicbehavior.Anotherinsightfulwayoflookingattheresultsoftheanalysisistoinvestigatehowthesigni cantGOtermsaredistributedovertheGOgraph.Figure2showsthethesubgraphinducedbythe5mostsigni cantGOtermsasidenti edbytheelimalgorithm.Signi cantnodesarerepresentedasrectangles.Theplottedgraphistheupperinducedgraphgeneratedbythesesigni cantnodes.�showSigOfNodes(sampleGOdata,score(resultKS.elim),firstSigNodes=5,useInfo='all') Figure2:Thesubgraphinducedbythetop5GOtermsidenti edbytheelimalgorithmforscoringGOtermsforenrichment.Rectanglesindicatethe5mostsigni cantterms.Rectanglecolorrepresentstherelativesigni cance,rangingfromdarkred(mostsigni cant)tobrightyellow(leastsigni cant).Foreachnode,somebasicinformationisdisplayed.The rsttwolinesshowtheGOidenti erandatrimmedGOname.Inthethirdlinetherawp-valueisshown.Theforthlineisshowingthenumberofsigni cantgenesandthetotalnumberofgenesannotatedtotherespectiveGOterm. 4Loadinggenesandannotationsdata4.1GettingstartedTodemonstratethepackagefunctionalitywewillusetheALL(AcuteLymphoblasticLeukemia)geneexpres-siondatafromChiaretti,S.,etal.(2004).Thedatasetconsistsof128microarraysfromdi erentpatientswithALLmeasuredusingtheHGU95aV2A ymetrixchip.Additionally,customannotationsandarti cialdatasetswillbeusedtodemonstratespeci cfeatures.We rstloadtherequiredlibrariesanddata:�library(topGO)�library(ALL)�data(ALL)WhenthetopGOpackageisloadedthreeenvironmentsGOBPTerm,GOMFTermandGOCCTermarecreatedandboundtothepackageenvironment.TheseenvironmentsarebuildbasedontheGOTERMenvironmentfrompackageGO.db.Theyareusedforfastrecoveringoftheinformationspeci ctoeachofthethreeontologies:BP,MFandCC.InordertoaccessallGOgroupsthatbelongtoaspeci contology,e.g.BiologicalProcess(BP),onecantype:�BPterms-ls(GOBPTerm)ကhead(BPterms)[1]"GO:0000001""GO:0000002""GO:0000003""GO:0000011""GO:0000012""GO:0000017"Usuallyoneneedstoremoveprobes/geneswithlowexpressionvalueaswellasprobeswithverysmallvariabilityacrosssamples.Packagegenefilterprovidestoolsfor lteringgenes.Inthisanalysiswechooseto lterasmanygenesaspossibleforcomputationalreasons;workingwithasmallergeneuniverseallowsustoexemplifymoreofthefunctionalitiesimplementedinthetopGOpackageandatthesametimeallowsthisdocumenttobecompiledinarelativelyshorttime.Thee ectofgene lteringisdiscussedinmoredetailsinSection4.6.ကlibrary(genefilter)ကselProbes-genefilter(ALL,filterfun(pOverA(0.20,log2(100)),function(x)(IQR(x)က0.25)))ကeset-ALL[selProbes,]The lterselectsonly4101probesetsoutof12625probesetsavailableonthehgu95av2array.ThegeneuniverseandthesetofinterestinggenesThesetofallgenesfromthearray/studywillbereferredfromnowonasthegeneuniverse.Havingthegeneuniverse,theusercande nealistofinterestinggenesortocomputegene-wisescoresthatquantifythesigni canceofeachgene.Whengene-wisescoresareavailablethelistofinterestinggenesisde nedtobethesetofgenewithasigni cantscore.ThetopGOpackagedealswiththesetwocasesinauni edwayoncethemaindatacontainer,thetopGOdataobject,isconstructed.Theonlytimetheuserneedstodistinguishbetweenthesetwocasesisduringtheconstructionofthedatacontainer.Usually,thegeneuniverseisde nedasallfeasiblegenesmeasuredbythemicroarray.InthecaseoftheALLdatasetwehave4101feasiblegenes,theonesthatwerenotremovedbythe lteringprocedure.4.2ThetopGOdataobjectThecentralstepinusingthetopGOpackageistocreateatopGOdataobject.ThisobjectwillcontainallinformationnecessaryfortheGOanalysis,namelythelistofgenes,thelistofinterestinggenes,thegenescores(ifavailable)andthepartoftheGOontology(theGOgraph)whichneedstobeusedintheanalysis.ThetopGOdataobjectwillbetheinputofthetestingprocedures,theevaluationandvisualisationfunctions.Tobuildsuchanobjecttheuserneedsthefollowing: ˆAlistofgeneidenti ersandoptionallythegene-wisescores.Thescorecanbethet-teststatistic(orthep-value)fordi erentialexpression,correlationwithaphenotype,oranyotherrelevantscore.ˆAmappingbetweengeneidenti ersandGOterms.InmostcasesthismappingisdirectlyavailableinBioconductorasamicroarrayspeci cannotationpackage.Inthiscasetheuserjustneedstospecifythenameoftheannotationtobeused.Forexample,theannotationpackageneededfortheALLdatasetishgu95av2.db.Ofcourse,Bioconductordoesnotincludeup-to-dateannotationpackagesforallplatforms.Userswhoworkwithcustomarraysorwishtouseaspeci cmappingbetweengenesandGOterms,havethepossibilitytoloadcustomannotations.ThisisdescribedinSection4.3.ˆTheGOhierarchicalstructure.ThisstructureisobtainedfromtheGO.dbpackage.AtthemomenttopGOsupportsonlytheontologyde nitionprovidedbyGO.db.Wefurtherdescribetheargumentsoftheinitializefunction(new)usedtoconstructaninstanceofthisdatacontainerobject.ontology:characterstringspecifyingtheontologyofinterest(BP,MForCC)description:characterstringcontainingashortdescriptionofthestudy[optional].allGenes:namedvectoroftypenumericorfactor.Thenamesattributecontainsthegenesidenti ers.Thegeneslistedinthisobjectde nethegeneuniverse.geneSelectionFun:functiontospecifywhichgenesareinterestingbasedonthegenescores.Itshouldbepresenti theallGenesobjectisoftypenumeric.nodeSize:anintegerlargerorequalto1.ThisparameterisusedtoprunetheGOhierarchyfromthetermswhichhavelessthannodeSizeannotatedgenes(afterthetruepathruleisapplied).annotationFun:functionwhichmapsgenesidenti erstoGOterms.Thereareacoupleofannotationfunctionincludedinthepackagetryingtoaddresstheuser'sneeds.Theannotationfunctionstakethreearguments.Oneofthoseargumentsisspecifyingwherethemappingscanbefound,andneedstobeprovidedbytheuser.Herewegiveashortdescriptionofeach:annFUN.dbthisfunctionisintendedtobeusedaslongasthechipusedbytheuserhasanannotationpackageavailableinBioconductor.annFUN.orgthisfunctionisusingthemappingsfromthe"org.XX.XX"annotationpackages.Currently,thefunctionsupportsthefollowinggeneidenti ers:Entrez,GenBank,Alias,Ensembl,GeneSymbol,GeneNameandUniGene.annFUN.gene2GOthisfunctionisusedwhentheannotationsareprovidedasagene-to-GOsmapping.annFUN.GO2genethisfunctionisusedwhentheannotationsareprovidedasaGO-to-genesmapping.annFUN.filethisfunctionwillreadtheannotationsofthetypegene2GOorGO2genesfromatext le....:listofargumentstobepassedtotheannotationFun4.3CustomannotationsThissectiondescribeshowcustomGOannotationscanbeusedforbuildingatopGOdataobject.Annotationsneedtobeprovidedeitherasgene-to-GOsorasGO-to-genesmappings.Anexampleofsuchmappingcanbefoundinthe"topGO/examples"directory.The le"geneid2go.map"containsgene-to-GOsmappings.Foreachgeneidenti erarelistedtheGOtermstowhichthisgeneisspeci callyannotated.WeusethereadMappingsfunctiontoparsethis le.�geneID2GO-readMappings(file=system.file("examples/geneid2go.map",package="topGO"))ကstr(head(geneID2GO)) Listof6$068724:chr[1:5]"GO:0005488""GO:0003774""GO:0001539""GO:0006935"...$119608:chr[1:6]"GO:0005634""GO:0030528""GO:0006355""GO:0045449"...$049239:chr[1:13]"GO:0016787""GO:0017057""GO:0005975""GO:0005783"...$067829:chr[1:16]"GO:0045926""GO:0016616""GO:0000287""GO:0030145"...$106331:chr[1:10]"GO:0043565""GO:0000122""GO:0003700""GO:0005634"...$214717:chr[1:7]"GO:0004803""GO:0005634""GO:0008270""GO:0003677"...TheobjectreturnedbyreadMappingsisanamedlistofcharactervectors.Thelistnamesgivethegenesidenti ers.EachelementofthelistisacharactervectorandcontainstheGOidenti ersannotatedtothespeci cgene.Itissucientforthemappingtocontainonlythemostspeci cGOannotations.However,topGOcanalsotakeasaninput lesinwhichallorsomeancestorsofthemostspeci cGOannotationsareincluded.Thisredundancyisnotmakingforafasterrunningtimeandifpossibleitshouldbeavoided.Theusercanreadtheannotationsfromtext lesortheycanbuildanobjectsuchasgeneID2GOdirectlyintoR.Thetext leformatrequiredbythereadMappingsfunctionisverysimple.Itconsistsofonelineforeachgenewiththefollowingsyntax:gene_IDTABကGO_ID1,GO_ID2,GO_ID3,....ReadingGO-to-genesmappingsfroma leisalsopossibleusingthereadMappingsfunction.However,itistheuserresponsibilitytoknowthedirectionofthemappings.Theusercaneasilytransformamappingfromgene-to-GOstoGO-to-genes(orvice-versa)usingthefunctioninverseList:ကGO2geneID-inverseList(geneID2GO)ကstr(head(GO2geneID))Listof6$GO:0000122:chr"106331"$GO:0000139:chr[1:6]"133103""111846""109956""161395"...$GO:0000166:chr[1:10]"067829""157764""100302""074582"...$GO:0000186:chr"181104"$GO:0000209:chr"159461"$GO:0000228:chr"214717"4.4Prede nedlistofinterestinggenesIftheuserhassomeaprioriknowledgeaboutasetofinterestinggenes,hecantesttheenrichmentofGOtermswithregardtothislistofinterestinggenes.Inthisscenario,whenonlyalistofinterestinggenesisprovided,theusercanuseonlytestsstatisticsthatarebasedongenecounts,likeFisher'sexacttest,Zscoreandalike.Todemonstratehowcustomannotationcanbeusedthissectionisbasedonthetoydataset,thegeneID2GOdata,fromSection4.3.Thegeneuniverseinthiscaseisgivenbythelistnames:ကgeneNames-names(geneID2GO)ကhead(geneNames)Sincefortheavailablegeneswedonothaveanymeasurementandthusnocriteriatoselectinterestinggenes,werandomlyselect10%genesfromthegeneuniverseandconsiderthemasinterestinggenes.ကmyInterestingGenes-sample(geneNames,length(geneNames)/10)ကgeneList-factor(as.integer(geneNames%in%myInterestingGenes))ကnames(geneList)-geneNamesကstr(geneList)Factorw/2levels"0","1":1111111111...-attr(*,"names")=chr[1:100]"068724""119608""049239""067829"... ThegeneListobjectisanamedfactorthatindicateswhichgenesareinterestingandwhichnot.Itshouldbestraightforwardtocomputesuchanamedvectorinarealcasesituation,wheretheuserhashisownprede nedlistofinterestinggenes.WenowhavealltheelementstoconstructatopGOdataobject.TobuildthetopGOdataobject,wewillusetheMFontology.ThemappingisgivenbythegeneID2GOlistwhichwillbeusedwiththeannFUN.gene2GOfunction.�GOdata-new("topGOdata",ontology="MF",allGenes=geneList,+annot=annFUN.gene2GO,gene2GO=geneID2GO)ThebuildingoftheGOdataobjectcantakesometime,dependingonthenumberofannotatedgenesandonthechosenontology.InourexampletherunningtimeisquitefastgiventhatwehavearathersmallsizegeneuniversewhichalsoimplyamoderatesizeGOontology,especiallysinceweareusingtheMFontology.Theadvantageofhaving(informationon)thegenescores(orbettergenesmeasurements)aswellasawaytode newhicharetheinterestinggenes,inthetopGOdataobjectisthatonecanapplyvariousgrouptestingprocedure,whichletustestmultiplehypothesisortunewithdi erentparameters.BytypingGOdataattheRprompt,theusercanseeasummaryofthedata.ကGOdata-------------------------topGOdataobject-------------------------Description:-Ontology:-MF100availablegenes(allgenesfromthearray):-symbol:068724119608049239067829106331...-10significantgenes.87feasiblegenes(genesthatcanbeusedintheanalysis):-symbol:068724119608049239067829106331...-9significantgenes.GOgraph(nodeswithatleast1genes):-agraphwithdirectededges-numberofnodes=240-numberofedges=318-------------------------topGOdataobject-------------------------OneimportantpointtonoticeisthatnotallthegenesthatareprovidedbygeneList,theinitialgeneuniverse,canbeannotatedtotheGO.Thiscanbeseenbycomparingthenumberofallavailablegenes,thegenespresentingeneList,withthenumberoffeasiblegenes.Wearethereforeforcedatthispointtorestrictthegeneuniversetothesetoffeasiblegenesfortherestoftheanalysis.ThesummaryontheGOgraphshowsthenumberofGOtermsandtherelationsbetweenthemofthespeci edGOontology.ThisgraphcontainsonlyGOtermswhichhaveatleastonegeneannotatedtothem.4.5UsingthegenesscoreInmanycasesthesetofinterestinggenescanbecomputedbasedonascoreassignedtoallgenes,forexamplebasedonthep-valuereturnedbyastudyofdi erentialexpression.Inthiscase,thetopGOdataobjectcanstorethegenesscoreandarulespecifyingthelistofinterestinggenes.Theadvantageofhavingboththe scoresandtheproceduretoselectinterestinggenesencapsulatedinthetopGOdataobjectisthattheusercanchoosedi erenttypesoftestsstatisticsfortheGOanalysiswithoutmodifyingtheinputdata.AtypicalexamplefortheALLdatasetisthestudywhereweneedtodiscriminatebetweenALLcellsdeliveredfromeitherB-cellorT-cellprecursors.�y-as.integer(sapply(eset$BT,function(x)return(substr(x,1,1)=='T')))ကtable(y)Thereare95B-cellALLsamplesand95T-cellALLsamplesinthedataset.Atwo-sidedt-testcanbyappliedusingthefunctiongetPvalues(awrapingfunctionforthemt.teststatfromthemulttestpackage).BydefaultthefunctioncomputesFDR(falsediscoveryrate)adjustedp-valueinordertoaccountformultipletesting.Adi erenttypeofcorrectioncanbespeci edusingthecorrectionargument.ကgeneList-getPvalues(exprs(eset),classlabel=y,alternative="greater")geneListisanamednumericvector.Thegeneidenti ersarestoredinthenamesattributeofthevector.Thissetofgenesde nesthegeneuniverse.Next,afunctionforspecifyingthelistofinterestinggenesmustbede ned.Thisfunctionneedstoselectgenesbasedontheirscores(inourcasetheadjustedp-values)andmustreturnalogicalvectorspecifyingwhichgeneisselectedandwhichnot.Thefunctionmusthaveoneargument,namedallScoreandmustnotdependonanyattributesofthisobject.Inthisexamplewewillconsiderasinterestinggenesallgeneswithanadjustedp-valuelowerthan0:01.Thiscriteriaisimplementedinthefollowingfunction:ကtopDiffGenes-function(allScore){+return(allScore0.01)+}&#x-523;x-topDiffGenes(geneList)ကsum(x)##thenumberofselectedgenesWithallthesestepsdone,theusercannowbuildthetopGOdataobject.ForashortdescriptionoftheargumentsusedbytheinitializefunctionseeSection4.4ကGOdata-new("topGOdata",+description="GOanalysisofALLdata;B-cellvsT-cell",+ontology="BP",+allGenes=geneList,+geneSel=topDiffGenes,+annot=annFUN.db,+nodeSize=5,+affyLib=affyLib)ItisoftenthecasethatmanyGOtermswhichhavefewannotatedgenesaredetectedtobesigni cantlyenrichedduetoartifactsinthestatisticaltest.ThesesmallsizedGOtermsareoflessimportancefortheanalysisandinmanycasestheycanbeomitted.ByusingthenodeSizeargumenttheusercancontrolthesizeoftheGOtermsusedintheanalysis.OncethegenesareannotatedtotheeachGOtermandthetruepathruleisappliedthenodeswithlessthannodeSizeannotatedgenesareremovedfromtheGOhierarchy.Wefoundthatvaluesbetween5and10forthenodeSizeparameteryieldmorestableresults.ThedefaultvalueforthenodeSizeparameteris1,meaningthatnopruningisperformed.Notethattheonlydi erenceintheinitialisationofanobjectofclasstopGOdatatothecaseinwhichwestartwithaprede nedlistofinterestinggenesistheuseofthegeneSelargument.AllfurtheranalysisdependsonlyontheGOdataobject.4.6FilteringandmissingGOannotationsBeforegoingfurtherwiththeenrichmentanalysisweanalysewhichoftheprobesavailableonthearraycanbeusedintheanalysis. Figure3:ScatterplotofFDRadjustedp-valuesagainstvarianceofprobes.Pointsbelowthehorizontallinearesigni cantprobes.Wewanttoseeifthe lteringperformedinSection4.1removesimportantprobes.Thereareatotalof12625probesonthehgu95av2chip.Oneassumesthatonlythenoisyprobes,probeswithlowexpressionvaluesorsmallvarianceacrosssamplesare lteredoutfromtheanalysis.Thenumberofprobeshaveadirecte ectonthemultipletestingadjustmentofp-values.Toomanyprobeswillresultintooconservativeadjustedp-valueswhichcanbiastheresultoftestslikeFisher'sexacttest.Thusitisimportanttocarefullyanalysethisstep.�allProb-featureNames(ALL)ကgroupProb-integer(length(allProb))+1ကgroupProb[allProb%in%genes(GOdata)]-0ကgroupProb[!selProbes]-2ကgroupProb-factor(groupProb,labels=c("Used","Notannotated","Filtered"))ကtt-table(groupProb)ကttgroupProbUsedNotannotatedFiltered38962058524Outofthe lteredprobesonly96%haveannotationtoGOterms.The lteringprocedureremoves8524probeswhichisaverylargepercentageofprobes(morethan50%),butwedidthisintentionallytoreducetheexpressionsetforcomputationalpurposes.Weperformadi erentialexpressionanalysisonallavailableprobesandwecheckifdi erentiallyexpressedgenesareleavedoutfromtheenrichmentanalysis.ကpValue-getPvalues(exprs(ALL),classlabel=y,alternative="greater")ကgeneVar-apply(exprs(ALL),1,var)ကdd-data.frame(x=geneVar[allProb],y=log10(pValue[allProb]),groups=groupProb)ကxyplot(y~x|groups,data=dd,groups=groups) Figure3showsforthethreegroupsofprobestheadjustedp-valuesandthegene-wisevariance.Probeswithlargechangesbetweenconditionshavelargevarianceandlowp-value.Inanidealcase,onewouldexpecttohavealargedensityofprobesinthelowerrightcornerofUsedpanelandfewprobesinthisregionintheothertwopanels.Wecanseethatthe lteringprocessthrowsoutsomesigni cantprobesandinarealanalysisamoreconservative lteringneedstobeapplied.However,therearealsomanydi erentiallyexpressedprobeswithoutGOannotationwhichcannotbeusedintheanalysis.5WorkingwiththetopGOdataobjectOncethetopGOdataobjectiscreatedtheusercanusevariousmethodsde nedforthisclasstoaccesstheinformationencapsulatedintheobject.Thedescriptionslotcontainsinformationabouttheexperiment.Thisinformationcanbeaccessedorreplacedusingthemethodwiththesamename.�description(GOdata)�description(GOdata)-paste(description(GOdata),"Objectmodifiedon:",format(Sys.time(),"%d%b%Y"),sep="")ကdescription(GOdata)Methodstoobtainthelistofgenesthatwillbeusedinthefurtheranalysisormethodsforobtainingallgenescoresareexempli edbelow.ကa-genes(GOdata)##obtainthelistofgenesကhead(a)[1]"1000_at""1005_at""1007_s_at""1008_f_at""1009_at""100_g_at"ကnumGenes(GOdata)[1]3896Nextwedescribehowtoretrievethescoreofaspeci edsetofgenes,e.g.asetofrandomlyselectedgenes.Iftheobjectwasconstructedusingalistofinterestinggenes,thenthefactorvectorthatwasprovidedatthebuildingoftheobjectwillbereturned.ကselGenes-sample(a,10)ကgs-geneScore(GOdata,whichGenes=selGenes)ကprint(gs)Iftheuserwantsanunnamedvectororthescoreofallgenes:ကgs-geneScore(GOdata,whichGenes=selGenes,use.names=FALSE)ကprint(gs)ကgs-geneScore(GOdata,use.names=FALSE)ကstr(gs)Thelistofsigni cantgenescanbeaccessedusingthemethodsigGenes().ကsg-sigGenes(GOdata)ကstr(sg)ကnumSigGenes(GOdata)AnotherusefulmethodisupdateGeneswhichallowstheusertoupdate/changethelistofgenes(andtheirscores)fromatopGOdataobject.Ifonewantstoupdatethelistofgenesbyincludingonlythefeasibleones,onecantype: �.geneList-geneScore(GOdata,use.names=TRUE)ကGOdata##moreavailablegenesကGOdata-updateGenes(GOdata,.geneList,topDiffGenes)ကGOdata##theavailablegenesarenowthefeasiblegenesTherearealsomethodsavailableforaccessinginformationrelatedtoGOanditsstructure.First,wewanttoknowwhichGOtermsareavailableforanalysisandtoobtainallthegenesannotatedtoasubsetoftheseGOterms.ကgraph(GOdata)##returnstheGOgraphAgraphNELgraphwithdirectededgesNumberofNodes=5961NumberofEdges=13371ကug-usedGO(GOdata)ကhead(ug)[1]"GO:0000002""GO:0000003""GO:0000018""GO:0000027""GO:0000028""GO:0000038"Wefurtherselect10randomGOterms,countthenumberofannotatedgenesandobtaintheirannotation.ကsel.terms-sample(usedGO(GOdata),10)ကnum.ann.genes-countGenesInTerm(GOdata,sel.terms)##thenumberofannotatedgenesကnum.ann.genesကann.genes-genesInTerm(GOdata,sel.terms)##gettheannotationsကhead(ann.genes)Whenthesel.termsargumentismissingallGOtermsareused.Thescoresforallgenes,possiblyannotatedwithnamesofthegenes,canbeobtainedusingthemethodscoresInTerm().ကann.score-scoresInTerm(GOdata,sel.terms)ကhead(ann.score)ကann.score-scoresInTerm(GOdata,sel.terms,use.names=TRUE)ကhead(ann.score)Finally,somestatisticsforasetofGOtermsarereturnedbythemethodtermStat.Asmentionedpreviously,ifthesel.termsargumentismissingthenthestatisticsforallavailableGOtermsarereturned.ကtermStat(GOdata,sel.terms)AnnotatedSignificantExpectedGO:2000463700.61GO:0010878510.44GO:00069392442.09GO:00977201501.31GO:00007312632.26GO:19032061010.87GO:00097497176.18GO:00435253212.78GO:00514571821.57GO:0002026700.61 6RunningtheenrichmenttestsInthissectionweexplainhowwecanrunthedesiredenrichmentmethodoncethetopGOdataobjectisavailable.topGOpackagewasdesignedtoworkwithvariousteststatisticsandvariousalgorithmswhichtaketheGOdependenciesintoaccount.AtthebaseofthisdesignstandsaS4classmechanismwhichfacilitatesde ningandexecutinga(new)grouptest.Threetypesofenrichmenttestscanbedistinguishifwelookatthedatausedbytheeachtest.ˆTestsbasedongenecounts.Thisisthemostpopularfamilyoftests,giventhatitonlyrequiresthepresenceofalistofinterestinggenesandnothingmore.TestslikeFisher'sexacttest,Hypegeometrictestandbinomialtestbelongtothisfamily.Draghicietal.(2006)ˆTestsbasedongenescoresorgeneranks.ItincludesKolmogorov-Smirnovliketests(alsoknownasGSEA),Gentleman'sCategory,t-test,etc.AckermannandStrimmer(2009)ˆTestsbasedongeneexpression.TestslikeGoeman'sglobaltestorGlobalAncovaseparatesfromtheotherssincetheyworkdirectlyontheexpressionmatrix.GoemanandBuhlmann(2007)Therearealsoanumberofstrategies/algorithmstoaccountfortheGOtopology,seeTable1,eachofthemhavingspeci crequirements.ForeachtesttypedescribedaboveandforeachalgorithmthereisS4classde nedinthepackage.Themainideaistohaveaclass(container)whichcanstore,foraspeci edgeneset(GOcategory),alldatanecessaryforcomputingthedesiredteststatistic,andamethodthatwilliterateoverallGOcategories.Insuchadesigntheuserneedstoinstantiateanobjectfromtheclasscorrespondingtothechosenmethod(teststatisticandalgorithm)andthenruntheiteratorfunctiononthisobject.Thede nedS4classesareorganisedinahierarchywhichisshowedinFigure4.Therearetwopossibilities(orinterfaces)forapplyingateststatistictoanobjectofclasstopGOdata.Thebasicinterface,whichprovidesthecoreofthetestingprocedureintopGO,o ersmore exibilitytotheexperiencedRuserallowinghimtoimplementnewteststatisticsornewalgorithms.Thesecondinterfaceismoreuserfriendlybutatthesametimemorerestrictiveinthechoiceofthetestsandalgorithmsused.Wewillfurtherexplainhowthesetwointerfaceswork. Figure4:Theteststatisticsclassstructure. 6.1De ningandrunningthetestThemainfunctionforrunningtheGOenrichmentisgetSigGroups()andittakestwoarguments.The rstargumentisaninstanceofclasstopGOdataandthesecondargumentisaninstanceofclassgroupStatsoranyofitsdescendents.Tobetterunderstandthisprincipleconsiderthefollowingexample.Assumewedecidedtoapplytheclassicalgorithm.Thetwoclassesde nedforthisalgorithmareclassicCountandclassicScore.IfanobjectofthisclassisgivenasaargumenttogetSigGroups()thantheclassicalgorithmwillbeused.ThegetSig-Groups()functioncantakeawhile,dependingonthesizeofthegraph(theontologyused),sobepatient.ThegroupStatsclassesNextweshowhowaninstanceofthegroupStatsclasscanrepresentagenesetandhowtheteststatisticisperformed.Wecomputetheenrichmentofcellularlipidmetabolicprocess(GO:0044255)termusingFisher'sexacttest.Inordertodothisweneedtode nethegeneuniverse,toobtainthegenesannotatedtoGO:0044255,andtode nethesetofsigni cantgenes.�goID-"GO:0044255"ကgene.universe-genes(GOdata)ကgo.genes-genesInTerm(GOdata,goID)[[1]]ကsig.genes-sigGenes(GOdata)NowwecaninstantiateanobjectofclassclassicCount.Oncetheobjectisconstructedwecangetthe22contingencytableorapplytheteststatistic.ကmy.group-new("classicCount",testStatistic=GOFisherTest,name="fisher",+allMembers=gene.universe,groupMembers=go.genes,+sigMembers=sig.genes)ကcontTable(my.group)signotSiganno29237notAnno3103320ကrunTest(my.group)[1]0.1156745TheslottestStatisticcontainsthefunction(ortobemoreprecise,themethod)whichcomputestheteststatistic.WeusedtheGOFisherTestfunctionwhichisavailableinthetopGOpackageandasthenamestatesitimplementsFisher'sexacttest.Theusercande nehisownteststatisticfunctionandthenapplyitusingthepreferredalgorithm.Thefunction,however,shouldusethemethodsde nedforthegroupStatsclasstoaccessthedataencapsulatedinsuchanobject.(ForexampleafunctionwhichcomputestheZscorecanbeeasilyimplementedusingasanexampletheGOFisherTestmethod.)TherunTestmethodisde nedforthegroupStatsclassanditsusedtorun/computetheteststatistic,bycallingthetestStatisticfunction.ThevaluereturnedbytherunTestmethodinthiscaseisthevaluereturnedbyGOFisherTestmethod,whichistheFisher'sexacttestp-value.ThecontTablemethod,showedintheexampleabove,isonlyde nedfortheclassesbasedongenecountsanditsusedtocompilethetwo-dimensionalcontingencytablebasedontheobject.ToshowhowthesameinterfaceisusedfortheclassesbasedongenecountswenextbuildaninstancefortheelimCountclass.Werandomlyselect25%oftheannotatedgenesasgenesthatshouldberemoved.ကset.seed(123)ကelim.genes-sample(go.genes,length(go.genes)/4) �elim.group-new("elimCount",testStatistic=GOFisherTest,name="fisher",+allMembers=gene.universe,groupMembers=go.genes,+sigMembers=sig.genes,elim=elim.genes)ကcontTable(elim.group)signotSiganno24176notAnno3103320ကrunTest(elim.group)[1]0.0639882Weseethattheinterfaceaccountsforthegenesthatneedtobeeliminated,oncetheobjectisinstantiated.Thesamemechanismappliesfortheotherhierarchyofclasses(thescorebasedandexpressionbasedclasses),exceptthateachhierarchyhasitsownspecialisedmethodsforcomputingstatisticsfromthedata.PleasenotethatthegroupStatsclassoranydescendentclassdoesnotdependonGO,andanobjectofsuchaclasscanbeinstantiatedusinganygeneset.PerformingthetestAccordingtothemechanismdescribedabove,one rstde nesateststatisticforthechosenalgorithm,meaningthataninstanceofobjectspeci cforthealgorithmisconstructedinwhichonlytheteststatisticmustbespeci ed,andthencallsagenericfunction(interface)torunthealgorithm.Accordingtothismechanism,one rstde nesateststatisticforthechosenalgorithm,inthiscaseclassicandthenrunsthealgorithm(seethesecondline).TheslottestStatisticcontainstheteststatisticfunction.IntheaboveexampleGOFisherTestfunctionwhichimplementsFisher'sexacttestandisavailableinthetopGOpackagewasused.Ausercande nehisownteststatisticfunctionandthenapplyitusingtheclassicalgorithm.(ForexampleafunctionwhichcomputestheZscorecanbeimplementedusingasanexampletheGOFisherTestmethod.)ကtest.stat-new("classicCount",testStatistic=GOFisherTest,name="Fishertest")ကresultFisher-getSigGroups(GOdata,test.stat)AshortsummaryontheusedtestandtheresultsisprintedattheRconsole.ကresultFisherDescription:GOanalysisofALLdata;B-cellvsT-cellObjectmodifiedon:30Oct2020Ontology:BP'classic'algorithmwiththe'Fishertest'test5961GOtermsscored:167termswithp0.01Annotationdata:Annotatedgenes:3896Significantgenes:339Min.no.ofgenesannotatedtoaGO:5Nontrivialnodes:4286TousetheKolmogorov-Smirnov(KS)testoneneedstoprovidethegene-wisescoresandthusweneedtoinstantiateanobjectofaclasswhichisabletodealofthescores.SuchaclassistheclassicScoreclass,seeFigure4whichwillletusruntheclassicalgorithm.&#x-523;test.stat-new("classicScore",testStatistic=GOKSTest,name="KStests")ကresultKS-getSigGroups(GOdata,test.stat) Themechanismpresentedaboveforclassicalsoholdforelimandweight.Theusershouldpayattentiontothecompatibilitybetweenthechosenclassandthefunctionforcomputingtheteststatistic,sincenoincompatibilitytestaremadewhentheobjectisinstantiated.Forexampletheweightalgorithmwillnotworkwithclassesbasedongene-wisescores.ToruntheelimalgorithmwithKStestoneneedstotype:�test.stat-new("elimScore",testStatistic=GOKSTest,name="Fishertest",cutOff=0.01)ကresultElim-getSigGroups(GOdata,test.stat)Similarly,fortheweightalgorithmwithFisher'sexacttestonetypes:ကtest.stat-new("weightCount",testStatistic=GOFisherTest,name="Fishertest",sigRatio="ratio")ကresultWeight-getSigGroups(GOdata,test.stat)6.2Theadjustmentofp-valuesThep-valuesreturnbythegetSigGroupsfunctionarerowp-values.Thereisnomultipletestingcorrectionappliedtothem,unlesstheteststatisticdirectlyincorporatesuchacorrection.Ofcourse,theresearchercanperformanadjustmentofthep-valuesifheconsidersitisimportantfortheanalysis.Thereasonfornotautomaticallycorrectingformultipletestingare:ˆInmanycasestherowp-valuesreturnbyanenrichmentanalysisarenotthatextreme.AFDR/FWERadjustmentprocedurecaninthiscaseproduceveryconservativep-valuesanddeclareno,orveryfew,termsassigni cant.Thisisnotnecessaryabadthing,butitcanhappenthatthereareinterestingGOtermswhichdidn'tmakeitoverthecuto buttheyareomittedandthusvaluableinformationlost.InthiscasetheresearchermightbeinterestedintherankingoftheGOtermseventhoughnotoptermissigni cantataspecifyFDRlevel.ˆOneshouldkeepinmindthatanenrichmentanalysisconsistofmanystepsandtherearemanyassumptionsdonebeforeapplying,forexample,Fisher'sexacttestonasetofGOterms.PerformingamultipletestingprocedureaccountingonlyonthenumberofGOtermsisfarfrombeingenoughtocontroltheerrorrate.ˆForthemethodsthataccountfortheGOtopologylikeelimandweight,theproblemofmultipletestingisevenmorecomplicated.Hereonecomputesthep-valueofaGOtermconditionedontheneighbouringterms.Thetestsarethereforenotindependentandthemultipletestingtheorydoesnotdirectlyapply.Weliketointerpretthep-valuesreturnedbythesemethodsascorrectedornota ectedbymultipletesting.6.3AddinganewtestExamplefortheCategorytest....6.4runTest:ahigh-levelinterfacefortestingOverthebasicinterfaceweimplementedanabstractlayertoprovidetheuserswithahigherlevelinterfaceforrunningtheenrichmenttests.Theinterfaceiscomposedbyafunction,namelytherunTestfunction,whichcanbeusedonlywithaprede nedsetofteststatisticsandalgorithms.InfactrunTestisawarpingfunctionforthesetofcommandsusedforde ningandrunningatestpresentedinSection6.1.Therearethreemainargumentsthatthisfunctiontakes.The rstargumentisanobjectofclasstopGOdata.Thesecondargument,namedalgorithm,isoftypecharacterandspeci eswhichmethodfordealingwiththeGOgraphstructurewillbeused.Thethirdargument,namedstatistic,speci eswhichgroupteststatisticwillbeused.ToperformaclassicalenrichmentanalysisbyusingtheclassicmethodandFisher'sexacttest,theuserneedstotype: �resultFis-runTest(GOdata,algorithm="classic",statistic="fisher")Variousalgorithmscanbeeasilycombinewithvariousteststatistics.Howevernotallthecombinationswillwork,asseeninTable1.Inthecaseofamismatchthefunctionwillthrowanerror.Thealgorithmargumentisoptionalandifnotspeci edtheweight01methodwillbeused.BellowwecanseemoreexamplesusingtherunTestfunction.ကweight01.fisher-runTest(GOdata,statistic="fisher")ကweight01.t-runTest(GOdata,algorithm="weight01",statistic="t")ကelim.ks-runTest(GOdata,algorithm="elim",statistic="ks")ကweight.ks-runTest(GOdata,algorithm="weight",statistic="ks")#willnotwork!!!ThelastlinewillreturnanerrorbecausewecannotusetheweightmethodwiththeKolmogorov-Smirnovtest.ThemethodsandthestatisticaltestswhichareaccessibleviatherunTestfunctionareavailableviathefollowingtwofunctions:ကwhichTests()[1]"fisher""ks""t""globaltest""sum""ks.ties"ကwhichAlgorithms()[1]"classic""elim""weight""weight01""lea""parentchild"ThereisnoadvantageofusingtherunTest()overgetSigGroups()exceptthatitismoreuserfriendlyanditgivescleanercode.However,iftheuserwantstode nehisownteststatisticorimplementanewalgorithmbasedontheavailablegroupStatsclasses,thenitwouldbenotpossibletousetherunTestfunction.Finally,thefunctioncanpassextraargumentstotheinitialisationmethodforangroupStatsobject.Thus,onecanspecifydi erentcuto sfortheelimmethod,orargumentsfortheweightmethod.7InterpretationandvisualizationofresultsThissectionpresenttheavailabletoolsforanalysingandinterpretingtheresultsoftheperformedtests.BothgetSigGroupsandrunTestfunctionsreturnanobjectoftypetopGOresult,andmostofthefollowingfunctionsworkwiththisobject.7.1ThetopGOresultobjectThestructureofthetopGOresultobjectisquitesimple.Itcontainsthep-valuesorthestatisticsreturnedbythetestandbasicinformationsontheusedteststatistic/algorithm.TheinformationstoredinthetopGOdataobjectisnotcarriedoverthisobject,andbothoftheseobjectswillbeneededbythediagnostictools.Sincetheteststatisticcanreturneitherap-valueorastatisticofthedata,wewillreferthemasscores!Toaccessthestoredp-values,theusershouldusethefunctionscore.Itreturnsanamednumericvector,werethenamesareGOidenti ers.Forexample,wecanlookatthehistogramoftheresultsoftheFisher'sexacttestandtheclassicalgorithm.Bydefault,thescorefunctiondoesnotwarrantytheorderinwhichthep-valuesarereturned,aswecanseeifwecomparetheresultFisobjectwiththeresultWeightobject:ကhead(score(resultWeight))GO:1902894GO:0048812GO:1902895GO:0048813GO:0071870GO:00706691.00000000.86365940.28169631.00000000.82248521.0000000 �pvalFis-score(resultFis)ကhead(pvalFis)GO:0000002GO:0000003GO:0000018GO:0000038GO:0000041GO:00000700.559646740.711209820.153506570.010701950.855953790.93563908ကhist(pvalFis,50,xlab="p-values") However,thescoremethodhasaparameter,whichGO,whichtakesalistofGOidenti ersandreturnsthescoresforthesetermsinthespeci edorder.Onlythescoresforthetermsfoundintheintersectionbetweenthespeci edGOsandtheGOsstoredinthetopGOresultobjectarereturned.Toseehowthisworkletscomputethecorrelationbetweenthep-valuesoftheclassicandweightmethods:�pvalWeight-score(resultWeight,whichGO=names(pvalFis))ကhead(pvalWeight)GO:0000002GO:0000003GO:0000018GO:0000038GO:0000041GO:00000700.559646740.987439471.000000000.010701950.950798091.00000000ကcor(pvalFis,pvalWeight)[1]0.5449325BasicinformationoninputdatacanbeaccessedusingthegeneDatafunction.Thenumberofannotatedgenes,thenumberofsigni cantgenes(ifitisthecase),theminimalsizeofaGOcategoryaswellasthenumberofGOcategorieswhichhaveatleastonesigni cantgeneannotatedarelisted:ကgeneData(resultWeight)AnnotatedSignificantNodeSizeSigTerms3896339542867.2SummarisingtheresultsWecanusetheGenTablefunctiontogenerateasummarytablewiththeresultsfromoneormoretestsappliedtothesametopGOdataobject.ThefunctioncantakeavariablenumberoftopGOresultobjectsanditreturnsadata.framecontainingthetoptopNodesGOtermsidenti edbythemethodspeci edwiththeorderByargument.Thisargumentallowstheuserdecidewhichp-valuesshouldbeusedfororderingtheGOterms. GO.IDTermAnnotatedSigni cantExpectedRankinclassicclassicKSweight 1GO:0045061thymicTcellselection1281.0411.1e-060.000161.1e-062GO:0031580membraneraftdistribution650.52112.7e-050.000992.7e-053GO:0042102positiveregulationofTcellproliferat...45133.92207.3e-050.010507.3e-054GO:0042473outerearmorphogenesis750.61218.8e-050.003828.8e-055GO:0048935peripheralnervoussystemneurondevelop...750.61228.8e-050.002478.8e-056GO:0010863positiveregulationofphospholipaseCa...1571.31250.000120.011990.000127GO:2001187positiveregulationofCD8-positive,alp...850.70330.000220.005750.000228GO:0043651linoleicacidmetabolicprocess1261.04370.000240.008440.000249GO:0050854regulationofantigenreceptor-mediated...38113.31380.000260.002190.0002610GO:0036109alpha-linolenicacidmetabolicprocess540.44400.000260.002560.0002611GO:0042474middleearmorphogenesis950.78450.000450.017610.0004512GO:0045586regulationofgamma-deltaTcelldi ere...950.78460.000450.013790.0004513GO:0034030ribonucleosidebisphosphatebiosynthetic...1871.57480.000490.000870.0004914GO:0034033purinenucleosidebisphosphatebiosynthe...1871.57490.000490.000870.0004915GO:0032760positiveregulationoftumornecrosisfa...2482.09520.000640.039270.0006416GO:0090270regulationof broblastgrowthfactorp...640.52570.000730.016530.0007317GO:0090181regulationofcholesterolmetabolicproc...2071.74630.001020.003470.0010218GO:0006636unsaturatedfattyacidbiosyntheticproc...2482.09530.000640.019510.0010319GO:0048304positiveregulationofisotypeswitching...740.61770.001590.009640.0015920GO:0010453regulationofcellfatecommitment840.70910.002970.061050.00297 Table3:Signi canceofGOtermsaccordingtodi erenttests.�allRes-GenTable(GOdata,classic=resultFis,KS=resultKS,weight=resultWeight,+orderBy="weight",ranksOf="classic",topNodes=20)Pleasenotethatweneedtotypethefullnames(theexactname)ofthefunctionarguments:topNodes,rankOf,etc.Thisisthepricepaidfor exibilityofspecifyingdi erentnumberoftopGOresultsobjects.ThetableincludesstatisticsontheGOtermsplusthep-valuesreturnedbytheotheralgorithms/teststatistics.Table3showstheinformationsincludedinthedata.frame.7.3AnalysingindividualGOsNextwewanttoanalysethedistributionofthegenesannotatedtoaGOtermofinterest.Inanenrichmentanalysisoneexpectsthatthegenesannotatedtoasigni cantlyenrichedGOtermhavehigherscoresthantheaveragegene'scoreofthegeneuniverse.Onewaytocheckthishypothesisistocomparethedistributionofthegenescoresannotatedtothespeci edGOtermwiththedistributionofthescoresofthecomplementarygeneset(allthegenesinthegeneuniversewhicharenotannotatedtotheGOterm).ThiscanbeeasilyachievedusingtheshowGroupDensityfunction.Forexample,letslookatthedistributionofthegenesannotatedtothemostsigni cantGOtermw.r.t.theweightalgorithm.ကgoID-allRes[1,"GO.ID"]ကprint(showGroupDensity(GOdata,goID,ranks=TRUE)) Figure5:Distributionofthegene'rankfromGO:0045061,comparedwiththenulldistribution. WecanseeinFigure5thatthegenesannotatedtoGO:0045061havelowranks(geneswithlowp-valueofthet-test).Thedistributionoftheranksisskewedontheleftsidecomparedwiththereferencedistributiongivenbythecomplementarygeneset.Thisisaniceexampleinwhichthereisasigni cantdi erenceinthedistributionofscoresbetweenthegenesetandthecomplementaryset,andweseefromTable3thatthisGOisfoundassigni cantlyenrichedbyallmethodsused.Intheaboveexample,thegeneswithap-valueequalto1wereomitted.TheycanbeincludedusingthevalueFALSEfortherm.oneargumentintheshowGroupDensityfunction.AnotherusefulfunctionforanalysingtermsofinterestisprintGenes.Thefunctionwillgenerateatablewithallthegenes/probesannotatedtothespeci edGOterm.Varioustypeofidenti ers,thegenenameandthep-values/statisticsareprovidedinthetable.�goID-allRes[10,"GO.ID"]ကgt-printGenes(GOdata,whichTerms=goID,chip=affyLib,numChar=40) ChipIDLL.idSymbol.idGenenamerawp-value 33821 at33821 at60481ELOVL5ELOVLfattyacidelongase53.53e-2139372 at39372 atNANA2.99e-1435117 at35117 at60481ELOVL5ELOVLfattyacidelongase51.45e-0932190 at32190 at9415FADS2fattyaciddesaturase27.68e-0740415 at40415 at30ACAA1acetyl-CoAacyltransferase11 Table4:GenesannotatedtoGO:0036109.Thedata.framecontainingthegenesannotatedtoGO:0036109isshowninTable4.OneormoreGOidenti erscanbegiventothefunctionusingthewhichTermsargument.WhenmorethanoneGOisspeci ed,thefunctionreturnsalistofdata.frames,otherwiseonlyonedata.frameisreturned.Thefunctionhasaargument lewhich,whenspeci ed,willsavetheresultsintoa leusingtheCSVformat.ForthemomentthefunctionwillworkonlywhenthechipusedhasanannotationpackageavailableinBioconductor.Itwillnotworkwithothertypeofcustomannotations.7.4VisualisingtheGOstructureAninsightfulwayoflookingattheresultsoftheanalysisistoinvestigatehowthesigni cantGOtermsaredistributedovertheGOgraph.Weplotthesubgraphsinducedbythemostsigni cantGOtermsreportedbyclassicandweightmethods.Therearetwofunctionsavailable.TheshowSigOfNodeswillplottheinducedsubgraphtothecurrentgraphicdevice.TheprintGraphisawarpingfunctionofshowSigOfNodesandwillsavetheresultinggraphintoaPDForPS le.�showSigOfNodes(GOdata,score(resultFis),firstSigNodes=5,useInfo='all')�showSigOfNodes(GOdata,score(resultWeight),firstSigNodes=5,useInfo='def')�printGraph(GOdata,resultFis,firstSigNodes=5,fn.prefix="tGO",useInfo="all",pdfSW=TRUE)�printGraph(GOdata,resultWeight,firstSigNodes=5,fn.prefix="tGO",useInfo="def",pdfSW=TRUE)Intheplots,thesigni cantnodesarerepresentedasrectangles.Theplottedgraphistheupperinducedgraphgeneratedbythesesigni cantnodes.Thesegraphplotsareusedtoseehowthesigni cantGOtermsaredistributedinthehierarchy.Itisaveryusefultooltorealizebehaviourofvariousenrichmentmethodsandtobetterunderstandwhichofthesigni cantGOtermsarereallyofinterest.Wecanemphasisedi erencesbetweentwomethodsusingtheprintGraphfunction:�printGraph(GOdata,resultWeight,firstSigNodes=10,resultFis,fn.prefix="tGO",useInfo="def")�printGraph(GOdata,resultElim,firstSigNodes=15,resultFis,fn.prefix="tGO",useInfo="all") Figure6:Thesubgraphinducedbythetop5GOtermsidenti edbytheclassicalgorithmforscoringGOtermsforenrichment.Boxesindicatethe5mostsigni cantterms.Boxcolorrepresentstherelativesigni cance,rangingfromdarkred(mostsigni cant)tolightyellow(leastsigni cant).Blackarrowsindicateis-arelationshipsandredarrowspart-ofrelationships. Figure7:Thesubgraphinducedbythetop5GOtermsidenti edbytheweightalgorithmforscoringGOtermsforenrichment.Boxesindicatethe5mostsigni cantterms.Boxcolorrepresentstherelativesigni cance,rangingfromdarkred(mostsigni cant)tolightyellow(leastsigni cant).Blackarrowsindicateis-arelationshipsandredarrowspart-ofrelationships. 8SessionInformationTheversionnumberofRandpackagesloadedforgeneratingthevignettewere:ˆRversion4.0.3(2020-10-10),x86_64-pc-linux-gnuˆLocale:LC_CTYPE=en_US.UTF-8,LC_NUMERIC=C,LC_TIME=en_US.UTF-8,LC_COLLATE=C,LC_MONETARY=en_US.UTF-8,LC_MESSAGES=en_US.UTF-8,LC_PAPER=en_US.UTF-8,LC_NAME=C,LC_ADDRESS=C,LC_TELEPHONE=C,LC_MEASUREMENT=en_US.UTF-8,LC_IDENTIFICATION=CˆRunningunder:Ubuntu18.04.5LTSˆMatrixproducts:defaultˆBLAS:/home/biocbuild/bbs-3.12-bioc/R/lib/libRblas.soˆLAPACK:/home/biocbuild/bbs-3.12-bioc/R/lib/libRlapack.soˆBasepackages:base,datasets,grDevices,graphics,grid,methods,parallel,stats,stats4,utilsˆOtherpackages:ALL1.32.0,AnnotationDbi1.52.0,Biobase2.50.0,BiocGenerics0.36.0,GO.db3.12.1,IRanges2.24.0,Rgraphviz2.34.0,S4Vectors0.28.0,SparseM1.78,gene lter1.72.0,graph1.68.0,hgu95av2.db3.2.3,lattice0.20-41,multtest2.46.0,org.Hs.eg.db3.12.0,topGO2.42.0,xtable1.8-4ˆLoadedviaanamespace(andnotattached):DBI1.1.0,MASS7.3-53,Matrix1.2-18,R62.5.0,RSQLite2.2.1,Rcpp1.0.5,XML3.99-0.5,annotate1.68.0,bit4.0.4,bit644.0.5,blob1.2.1,compiler4.0.3,digest0.6.27,httr1.4.2,matrixStats0.57.0,memoise1.1.0,pkgcon g2.0.3,rlang0.4.8,splines4.0.3,survival3.2-7,tools4.0.3,vctrs0.3.4ReferencesAckermann,M.andStrimmer,K.(2009).Ageneralmodularframeworkforgenesetenrichmentanalysis.BMCBioinformatics,10:47.10.1186/1471-2105-10-47.Alexa,A.,Rahnenfuhrer,J.,andLengauer,T.(2006).Improvedscoringoffunctionalgroupsfromgeneexpressiondatabydecorrelatinggographstructure.Bioinformatics(Oxford,England),22:1600{1607.10.1093/bioinformatics/btl140.Chiaretti,S.,etal.(2004).Geneexpressionpro leofadultT-cellacutelymphocyticleukemiaidenti esdistinctsubsetsofpatientswithdi erentresponsetotherapyandsurvival.Blood,103(7):2771{2778.Consortium,G.O.(2001).CreatingtheGeneOntologyResource:DesignandImplementation.GenomeResearch,11:1425{1433.ColdSpringHarborLaboratoryPress.Draghici,S.,Sellamuthu,S.,andKhatri,P.(2006).Babel'stowerrevisited:auniversalresourceforcross-referencingacrossannotationdatabases.Bioinformatics(Oxford,England),22:btl372v1{2939.10.1093/bioinformatics/btl372.Goeman,J.J.andBuhlmann,P.(2007).Analyzinggeneexpressiondataintermsofgenesets:methodologicalissues.Bioinformatics(Oxford,England),23:980{987.10.1093/bioinformatics/btm051.Grossmann,S.,Bauer,S.,Robinson,P.N.,andVingron,M.(2007).Improveddetectionofoverrepresentationofgene-ontologyannotationswithparentchildanalysis.Bioinformatics,23:3024.10.1093/bioinformat-ics/btm440.YonRhee,S.,Wood,V.,Dolinski,K.,andDraghici,S.(2008).Useandmisuseofthegeneontologyannotations.Naturereviews.Genetics,advancedonlinepublication:509{515.10.1038/nrg2363.