/
Package expands February   Type Package Title ExPANdS Package expands February   Type Package Title ExPANdS

Package expands February Type Package Title ExPANdS - PDF document

sherrill-nordquist
sherrill-nordquist . @sherrill-nordquist
Follow
414 views
Uploaded On 2015-06-14

Package expands February Type Package Title ExPANdS - PPT Presentation

5 Date 20140927 Author Noemi Andor Maintainer Noemi Andor Description Expanding Ploidy and Allele Frequency on Nested Subpopulations ExPANdS charac terizes coexisting subpopulations in a tumor using copy number and allele frequencies de rived from ex ID: 86024

Date 20140927 Author Noemi

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Package expands February Type Package ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Package`expands'June30,2018TypePackageTitleExpandingPloidyandAllele-FrequencyonNestedSubpopulationsVersion2.1.2Date2018-06-29AuthorNoemiAndorMaintainerNoemiAndorxpa;&#xnds.;&#xr@gm; il.; om0;DescriptionExpandingPloidyandAlleleFrequencyonNestedSubpopulations(expands)character-izescoexistingsubpopulationsinasingletumorsampleusingcopynumberandallelefrequen-ciesderivedfromexome-orwholegenomesequencingin-putdata(.ncbi.nlm.nih.go&#xhttp;&#x://w;&#xww65;v/pubmed/24177718).Themodeldetectscoexist-inggenotypesbyleveragingrun-specictradeoffsbetweendepthofcoverageandbreadthofcov-erage.Thispackagepredictsthenumberofclonalexpansions,thesizeoftheresultingsubpopu-lationsinthetumorbulk,themutationsspecictoeachsubpopulation,tumorpurityandphy-logeny.ThemainfunctionrunExPANdS()providesthecompletefunctionalityneededtopre-dictcoexistingsubpopulationsfromsinglenucleotidevariations(SNVs)andassoci-atedcopynumbers.Therobustnessofsubpopulationpredictionsincreaseswiththenum-berofmutationsprovided.Itisrecommendedthatatleast200mutationsareusedasinputtoob-tainstableresults.Updatesinversion2.1include:(i)newparameterploidyinrunExPANdS.Ral-lowsspecicationofnon-diploidbackgroundploidies(e.g.fornear-triploidcelllines);(ii)paral-lelcomputingoptionisavailable.FurtherdocumentationandFAQavailableatdiscovery.stanford.edu/software/e&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;xpands.LicenseGPL-2URLhttp://dna-discovery.stanford.edu/software/expands,https://github.com/noemiandor/expands,https://groups.google.com/d/forum/expandsDependsR&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=2.10)Importsexclust,plyr,RColorBrewer,gplots,NbClust,moments&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=0.13),rJava&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=0.5-0),exmix&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=2.3),matlab&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=0.8.9),ape&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=3.2),commonsMath&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=1.1),parallelSuggestsphylobase&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=0.6.8)SystemRequirementsJava&#xhttp;&#x://d;&#xna-];&#xTJ 0;&#x -11;&#x.955;&#x Td ;&#x[000;(=5.0)1 2assignMutationsNeedsCompilationnoRepositoryCRANDate/Publication2018-06-3015:08:14UTCRoxygenNote6.0.1Rtopicsdocumented:assignMutations.......................................2assignQuantityToMutation.................................4assignQuantityToSP.....................................5buildMultiSamplePhylo...................................6buildPhylo..........................................8cbs..............................................9cellfrequency_pdf......................................10clusterCellFrequencies...................................12computeCellFrequencyDistributions............................13gatherEXPANDSoutput...................................14plotSPs...........................................15roi..............................................16runExPANdS........................................17simulation..........................................20snv..............................................20Index22 assignMutationsMutationAssignment DescriptionAssignsmutationstopreviouslypredictedsubpopulations.UsageassignMutations(dm,finalSPs,max_PM=6,cnvSPs=NULL,ploidy=2,verbose=T)ArgumentsdmMatrixinwhicheachrowcorrespondstoamutation.Hastocontainatleastthefollowingcolumnnames:chr-thechromosomeonwhicheachmutationislocated;startpos-thegenomicpositionofeachmutation;AF_Tumor-theallele-frequencyofeachmutation;PN_B-thecountoftheB-alleleinnormal(non-tumor)cells(binaryvariable:1ifthemutationisagermlinevariant,0ifsomatic). assignMutations3finalSPsMatrixinwhicheachrowcorrespondstoasubpopulation,ascalculatedbyclusterCellFrequencies.max_PMUpperthresholdforthenumberofampliconspermutatedcell.Seealsocellfrequency_pdf.cnvSPsMatrixinwhicheachrowcorrespondstoasubpopulation,ascalculatedbyclusterCellFrequencies.Ifnotset,nalSPswillbeusedtoassignCNVsaswellasSNVs.ploidyThebackgroundploidyofthesequencedsample(default:2).Changingthevalueofthisparameterisnotrecommended.Dealingwithcelllinesortumorbiopsiesofveryhigh�(=0.95)tumorpurityisanecessarybutnotsufcientconditiontochangethevalueofthisparameter.verboseGiveamoreverboseoutput.DetailsEachmutatedlocuslisassignedtothesubpopulationC,whosesizefCcanbestexplaintheallelefrequency(AF)andcopynumber(CN)observedatl.Fouralternativecellfrequencyprobabilities,Px(fC),arecalculatedfortheSNVatlocusl,withxdenotingoneofthefouralternativeevolution-aryscenarios(seealsocellfrequency_pdf).TheSNVisassignedtosubpopulation:C:=argmaxC(Ps(fC);Pp(fC);Pc(fC);Pi(fC))(seecellfrequency_pdf).Themutatedlociassignedtoeachsubpopulationclusterrepresentthegeneticproleofeachpre-dictedsubpopulation.TheassignmentbetweensubpopulationCandlocuslonlyimpliesthattheSNVatlhasbeenrstpropagatedduringtheclonalexpansionthatgaverisetoC.SoSNVspresentinCmaynotbeexclusivetoCbutmayalsobepresentinsubpopulationssmallerthanC.Whetherornotthisisthecasecansometimesbeinferredfromthephylogeneticstructureofthesubpopulationcomposition.SeealsobuildPhylo.ValueAlistwithtwoelds:dmTheinputmatrixwithsevenadditionalcolumns:SP-subpopulationtowhichthepointmutationhasbeenassigned;PM_B-countoftheB-alleleatthemutatedgenomiclocus,intheassignedsub-population(SP).PM-totalcountofallalleles,intheassignedsubpopulation(SP).SP_cnv-ifthepointmutationlieswithinanampliedordeletedregion:thesubpopulationtowhichthecopynumbervariationhasbeenassigned.Thisen-tryhasthesamevalueasSPifandonlyif:i)theSNVandtheCNVwerepropagatedduringthesameclonalexpansionorii)theSNVlieswithinacopyneutralregion.PM_B_cnv-countoftheB-allele,intheCNVharboringsubpopulation(SP_cnv).PM_cnv-totalcountofallalleles,intheCNVharboringsubpopulation(SP_cnv).%maxP-condenceoftheassignedSP/SP_cnvscenario. 4assignQuantityToMutationfinalSPsTheinputmatrixofsubpopulationswithcolumnnMutationsupdatedaccordingtothetotalnumberofmutationsassignedtoeachsubpopulation.Author(s)NoemiAndorReferencesLi,B.&Li,J.Z(2014).AgeneralframeworkforanalyzingtumorsubclonalityusingSNParrayandDNAsequencingdata.GenomeBiol.SeeAlsoclusterCellFrequencies assignQuantityToMutationQuantityassignment(copynumber)tomutations DescriptionAssignsaquantitytoeachmutatedlocus.Currently,theonlyassignablequantityistheaveragecopynumber(amongallcells)ofthelocusinwhichthemutationisembedded.UsageassignQuantityToMutation(dm,cbs,quantityColumnLabel="CN_Estimate",verbose=T)ArgumentsdmMatrixinwhicheachrowcorrespondstoamutation.Hastocontainatleastthefollowingcolumnnames:chr-thechromosomeonwhicheachmutationislocated;startpos-thegenomicpositionofeachmutation.cbsMatrixinwhicheachrowcorrespondstoacopynumbersegmentascalculatedbyacircularbinarysegmentationalgorithm.Hastocontainatleastthefollow-ingcolumnnames:chr-chromosome;startpos-therstgenomicpositionofacopynumbersegment;endpos-thelastgenomicpositionofacopynumbersegment;CN_Estimate-thecopynumberestimatedforeachsegment.quantityColumnLabelThenameofthenewcolumn.Validoptionsare:FPKM,CN_Estimate.verboseGiveamoreverboseoutput. assignQuantityToSP5ValuedmTheinputmatrixwiththreeadditionalcolumns:quantityID-theIDoftheassignedquantity;quantityColumnLabel-thequantity.Author(s)NoemiAndorExamplesdata(cbs)data(snv)dm=assignQuantityToMutation(snv,cbs,quantityColumnLabel="CN_Estimate") assignQuantityToSPQuantityassignment(copynumber)tosubpopulations DescriptionAssignsquantitiestopredictedsubpopulations.Currently,theonlyassignablequantityaresubpop-ulationspeciccopynumberstatesfortheinputgenomesegments.UsageassignQuantityToSP(cbs,dm,C=list(sp=c("SP","SP_cnv"),pm=c("PM","PM_cnv")),e=1,v=T)ArgumentscbsMatrixinwhicheachrowcorrespondstoacopynumberfragmentascomputedbyacircularbinarysegmentationalgorithm.Hastocontainatleastthefollow-ingcolumnnames:chr-chromosome;startpos-therstgenomicpositionofacopynumbersegment;endpos-thelastgenomicpositionofacopynumbersegment;CN_Estimate-thecopynumberestimatedforeachsegment(weightedaveragevalueacrossallsubpopulationsinthesample).dmMatrixinwhicheachrowcorrespondstoamutation.Hastocontainatleastthefollowingcolumnnames:chr-chromosomeonwhicheachpointmutationislocated;startpos-genomicpositionofeachmutation;SP-subpopulationtowhichthepointmutationhasbeenassigned;SP_cnv-subpopulationwithacopynumbervariationwithinthesamegenomicsegmentinwhichSPhasapointmutation;PM-totalcountofallallelesinthesubpopulationwiththepointmutation(SP);PM_cnv-totalcountofallallelesinthesubpopulationwiththecopynumbervariation(SP_cnv). 6buildMultiSamplePhyloCListreferencingcolumnnamesinthemutationmatrix,withtwoelds:sp-columnnamesholdingsubpopulationsizes(typically"SP","SP_cnv");pm-columnnamesholdingthetotalallelecountsassignedforeachsubpopula-tion(typically"PM","PM_cnv").eMaximumvarianceofsubpopulationspeciccopynumbersforagivensegment,abovewhichsegmentwillremainunassignedforthecorrespondingsubpopula-tion.Determineswhetherornottoassigncopynumbertoasubpopulation,SPi,forasegmentcontainingmultipleSPispeciccopynumbers,atleasttwoofwhicharedistinct.vGiveamoreverboseoutput.ValueTheinputcopynumbermatrixwithoneadditionalcolumnforeachpredictedsubpopulation:SP_xx-wherexxisthesizeofthecorrespondingsubpopulation.ColumnentriescontainthecopynumberofeachsegmentinSP;Value&#xN350;Aindicatesthatnocopynumbercouldbeinferredforthesegmentinthissubpopulation(eitherbecausethesubpopulationhadnopointmutations/CNVswithinthesegment,orbecauseithadmultiple,ambiguouscopynumberassignmentswithinthesegment).Author(s)NoemiAndor buildMultiSamplePhyloRelationsbetweeninter-andintra-samplesubpopulations DescriptionPredictsphylogeneticrelationsbetweensubpopulationsfromsubpopulationspeciccopynumberandpointmutationproles,whileincludinginformationaboutsampleoriginofeachsubpopulation.ThisfunctiondiffersfrombuildPhylointhatitintegratesthesubpoulationspredictedinmultiple,geographicallydistincttumor-samplesintoonecommonphylogenyandinthatitincludespointmutationsinadditiontocopynumbervariationstoinferinter-samplephylogeneticrelations.UsagebuildMultiSamplePhylo(samGr,out,treeAlgorithm="bionjs",e=0,plotF=1,spRes=1,v=F)ArgumentssamGrListwiththreeelds:cbs-InputofrunExPANdS:matrixinwhicheachrowcorrespondstoacopynumbersegment.CBSistypicallytheoutputofacircularbinarysegmentationalgorithm.ColumnsinCBSmustbelabeledandmustincludechr,startpos,end-posandCN_Estimate(seecbs).DonotusetheoutputofrunExPANdShere. buildMultiSamplePhylo7sps-OutputofrunExPANdS.Matrixinwhicheachrowcorrespondstoaso-maticmutation.Columnsmustinclude:chr-thechromosomeonwhicheachmutationislocated;startpos-thegenomicpositionofeachmutation;SP-thesubpopulationtowhichthemutationhasbeenassigned;PM-thetotalcountofallallelesatthemutatedgenomiclocus,intheassignedsubpopulation;PM_B-thecountoftheB-alleleatthemutatedgenomiclocus,intheassignedsubpop-ulation;CN_Estimate-theaveragecopynumber(amongallcells)ofthelocusinwhichthemutationisembedded(seealsoassignQuantityToMutation).labels-Labeldenotingsampleoriginofeachsubpopulationmatrix.Entryismandatoryforeachgeographicalsample.outPrexofletowhichmulti-samplephylogenywillbesaved.treeAlgorithmNeighborjoiningalgorithmusedforphylogenyreconstruction(fromlibraryape).Options:bionjs(default),njs.eInputparameter"e"forcalledfunction:assignQuantityToSP.plotFOptionfordisplayingthephylogenetictree(0-nodisplay;1-display).spResOptiononwhetherornottoignorethesubpopulationscalculatedforeachsam-pleandinsteadtreateverygeographicaltumor-sampleasonesingletumor-metapopulation(Defaultvalue:1-subpopulationresolution;0-metapopulationresolution).vGiveamoreverboseoutput.DetailsThisfunctiondoesnotchangethesubpopulationmembershipofSNVs.Insteaditreconstructsphylogeneticrelationshipsbetweensubpopulationsusingneighbor-joiningalgorithmsprovidedbyR-package'ape'.Pairwisedistancesbetweensubpopulationsiandjarecalculatedas:dij:=(cnvi=j+snvi=j)=(cnvij+snvij),wherecnvi=jisthenumberofcopynumbersegmentsforwhichsubpopulationsiandjhavethesamecopynumber;snvi=jisthenumberofpointmuta-tionsforwhichsubpopulationsiandjhavethesamemutationstatusandcnvij;snvijarethetotalnumberofcopynumbersegmentsandmutationsrespectively,forwhichbothsubpopulationshaveavailableinformation.Subpopulationswithinsufcientcopynumberandpointmutationsinforma-tionareexcludedfromphylogeny.ValueAnobjectofclass"phylo"(libraryape).Author(s)NoemiAndorSeeAlsobuildPhylo 8buildPhylo buildPhyloRelationsbetweensubpopulations DescriptionPredictsphylogeneticrelationsbetweensubpopulationsfromsubpopulationspeciccopynumberproles.UsagebuildPhylo(sp_cbs,outF,treeAlgorithm="bionjs",dm=NA,add="Germline",verbose=T)Argumentssp_cbsSubpopulationspeciccopynumbermatrixinwhicheachrowcorrespondstoacopynumbersegment.Hastocontainatleastonecolumnforeachpredictedsubpopulation.SubpopulationcolumnnamesmustbelabeledSP_xx,wherexxisthesizeofthecorrespondingsubpopulation.InputparameterspcbscanbeobtainedbycallingassignQuantityToSP.outFPrexofletowhichphylogenywillbesaved.treeAlgorithmNeighborjoiningalgorithmusedforphylogenyreconstruction(fromlibraryape).Options:bionjs(default),njs.dmOptionalmatrixinwhicheachrowcorrespondstoamutation.Onlymutationslocatedonautosomesshouldbeincluded.Columnsindmmustbelabeledandmustinclude:SP-subpopulationtowhichthepointmutationhasbeenassigned.SP_cnv-subpopulationtowhichtheCNV(overlappingwiththepointmuta-tion)hasbeenassigned(ifanCNVispresent).chr-chromosomeonwhicheachpointmutationislocated;startpos-genomicpositionofeachpointmutation;PM-totalcountofallallelesatthemutatedgenomiclocus,intheassignedsubpopulation.PM_B-countoftheB-alleleatthemutatedgenomiclocus,intheassignedsub-population.Ifdmisavailable,anattemptwillbemadetoassigneverymutationto�1sub-populationaccordingtotheinferredphylogeneticrelationsbetweensubpopula-tions.addArticialsubpopulationtobeincludedinphylogeny(options:'Germline','Consensus',NULL).verboseGiveamoreverboseoutput.DetailsReconstructsphylogeneticrelationshipsbetweensubpopulationsusingneighbor-joiningalgorithmsprovidedbyR-package'ape'.Pairwisedistancesbetweensubpopulationsarecalculatedasthenumberofcopynumbersegmentsforwhichbothsubpopulationshavethesamecopynumber, cbs9dividedbythetotalnumberofcopynumbersegmentsforwhichbothsubpopulationshaveavailablecopynumberinformation.Subpopulationswithinsufcientcopynumberinformationareexcludedfromphylogeny.ValueListwithtwoelds:treeAnobjectofclass"phylo"(libraryape).dmTheinputmatrixwitheachrowrepresentingapointmutationandadditionalcolumns:SP_xx-wherexxisthesizeofthecorrespondingsubpopulation.Columnen-triescontainabinaryindicatorofwhetherornotthepointmutationinthisrowispresentinSP_xx.Author(s)NoemiAndorSeeAlsoassignQuantityToSP cbsMatrixofcopynumberfragments DescriptionCopynumbersegmentsasobtainedbycircularbinarysegmentation.DataisderivedfromaGlioblas-tomatumor(TCGA-06-0152-01).Usagedata(cbs)FormatNumericmatrixwith120rows(onepercopy-numbersegment)and4columns:chr-thechromosomestartpos-genomicpositionatwhichcopy-numbersegmentstarts.endpos-genomicpositionatwhichcopy-numbersegmentends.CN_Estimate-averagecopy-numberofthesegmentamongallcells.SourceDataderivedfromTheCancerGenomeAtlas(TCGA). 10cellfrequency_pdf cellfrequency_pdfComputestheprobabilitydistributionofcellularfrequenciesforasin-glemutation. DescriptionCalculatesP-theprobabilitydensitydistributionofcellularfrequenciesforonesinglepointmuta-tionorCNV.Foreachcell-frequencyf,thevalueofP(f)reectstheprobabilitythatthemutationispresentinafractionfofcells.Usagecellfrequency_pdf(af,cnv,pnb,freq,max_PM=6,ploidy=2,enforceCoocurrence=T)ArgumentsafTheallelicfrequencyatwhichthepointmutationhasbeenobserved.cnvTheaveragecopynumberofthelocusinwhichthemutationisembedded.pnbThecountoftheB-alleleinnormalcells(binaryvariable:1ifthemutationisagermlinevariant,0ifsomatic).B-allelesthathave�1copyinnormalcellsarenotmodeled.freqVectorofcellularfrequenciesatwhichtheprobabilitieswillbecalculated.max_PMUpperthresholdforthenumberofampliconspermutatedcell.max_PMisthemaximumnumberofampliconsabovewhichsolutionsarerejectedinthecell-frequencyestimationstepdescribedbelow,i.e.PMmax_PM:Thechoiceofmax_PMshoulddependongenomicdepthofcoverageandonthefractionofthegenomesequenced:thehigherthequalityandabundanceofdata,thehighermax_PM:ploidyThebackgroundploidyofthesequencedsample(default:2).Changingthevalueofthisparameterisnotrecommended.Dealingwithcelllinesortumorbiopsiesofveryhigh&#x=]TJ;&#x/F11;&#x 9.9;ئ ;&#xTf 1;.77; 0 ;&#xTd [;(=0.95)tumorpurityisanecessarybutnotsufcientconditiontochangethevalueofthisparameter.enforceCoocurrenceWhetherornottoenforceassumptionthatoverlappingSNVandCNVwereco-propagatedaspartofthesameclonalexpansion.DetailsWeconsidertwotypesofmolecularmechanismsthatconvertalocusintoitsmutatedstate:copynumbervariation(CNV)inducingeventsandsinglenucleotidevariation(SNV)inducingevents.WeassumethatanormalstateisdenedbyatotalallelecountoftwoandBallelecountbelowtwo,whereasamutatedstatehasanincreasedfractionofBalleles.Theconditionsdeningthesestatesforeachlocuslareasfollows:i)PMB;PNB;PM;PN2N;ii)PMB1;PNB1;PN=2;iii)PMB PMPNB PN. cellfrequency_pdf11PMBandPNBdenotethecountoftheBalleleineachcelltype:mutatedcellsandnormalcells,respectively.ThevalueofPNBisoneiflhasagermlinevariant,zerootherwise.PM;PNarethetotalallelecountofmutatedcellsandnormalcells.PMisrequiredtobebetweenoneandmax_PM,thatis,weexcludesolutionsforwhichthemaximumnumberofampliconspercellexceedstheuserdenedvalueofmax_PM.Thefunctionreturnstheprobabilitydistribution,P(f),thatthemutationatlocuslispresentinafractionfofcells,wheref2[0;1].Fouralternativecellfrequencyprobabilitydistributionscenarios,P(f),canbeobtainedforeachallele-frequency+copynumberpair(AF,CN).Foreachscenario,modelstartswithagermlinepopulationthatwillbetherootofallothermodeledsubpopulations.Firstsubpopulation(fcnv)modeledtoevolvefromthegermlinepopulationisalwaystheonecarryingaCNV:pmfcnv+PN(1�fcnv)=CN,wherepmisthetotalallelecountoffcnv.Asubsequentsubpopulation(fsnv)isalwaysdenedbyanSNVandismodeledinrelationtofcnv,eitheras:1.Ps(f)-itssibling:PMBfsnv+PNB(1�fsnv)=AFCN,wherefsnv+fcnv=1;PMB=2.2.Pp(f)-itsparent:PMB(fsnv�fcnv)+pmBfcnv+PNB(1�fsnv)=AFCN,wherefsnv&#x]TJ/;ø 9;&#x.962; Tf;&#x 7.7;I 0;&#x Td ;&#x[000;fcnv;PMB=2andpmBistheB-allelecountoffcnv.3.Pc(f)-itschild:PMBfsnv+PNB(1�fsnv)=AFCN,wherefsnvfcnv;PMB=pm.4.Pi(f)-itself:PMBf+PNB(1�f)=AFCN,wheref=fsnv=fcnv;PMB=pm.Under1),SNVandCNVarecompletelyindependentastheyareneverco-propagatedduringthesameclonalexpansion.Under2)and3),SNVandCNVarepartiallydependent,yetpresentintwodistinctsubpopulations.Under4),boththeSNVandanCNVatlwerepropagatedduringthesameclonalexpansion.ValueListwithfourcomponents:pTheprobabilitythatthepointmutation/CNVispresentinafractionfofcells,foreachinputfrequencyfinparameterfreq.bestFThecellularfrequencythatbestexplainstheobservedallelefrequencyand/orcopynumber.Author(s)NoemiAndorReferencesNoemiAndor,JulieHarness,SabineMueller,HansWernerMewesandClaudiaPetritsch.(2013)ExPANdS:ExpandingPloidyandAlleleFrequencyonNestedSubpopulations.Bioinformatics. 12clusterCellFrequenciesExamplesfreq=seq(0.1,1.0,by=0.01);cfd=cellfrequency_pdf(af=0.26,cnv=1.95,pnb=0,freq=freq,max_PM=6)plot(freq,cfd$p,type="l",xlab="f",ylab="P(f)"); clusterCellFrequenciesClusteringofcellularfrequencyprobabilitydistributions DescriptionCalculatesoverrepresentedcellfrequenciesusingatwo-stepapproach.Basedontheassumptionthatpassengermutationsoccurwithinacellpriortothedrivereventthatinitiatestheexpansion,eachclonalexpansionshouldbemarkedbymultiplemutations.Thusmutationsandcopynumbervariationsthattookplaceinacellpriortoaclonalexpansionshouldbepresentinasimilarfractionofcellsandleaveasimilar"frequency-trace"duringtheirpropagation.UsageclusterCellFrequencies(densities,p,nrep=30,min_CF=0.1,verbose=T)ArgumentsdensitiesMatrixasobtainedbycomputeCellFrequencyDistributions.Eachrowcor-respondstoamutationandeachcolumncorrespondstoacellularfrequency.Eachvaluedensities[i;j]representstheprobabilitythatmutationiispresentinafractionfofcells,wherefisgivenby:colnames(densities[;j]):pPrecisionwithwhichsubpopulationsizeispredicted,asmallvaluereectsahighresolutionandcanleadtoahighernumberofpredictedsubpopulations.nrepPositiveintegerindicatingthenumberofalgorithmrepetitions(default:30).min_CFLowerthresholdfortheprevalenceofamutatedcell(default:0.1).verboseGiveamoreverboseoutput.DetailsIntherststep,mutationswithsimilarcellularfrequenciesaregroupedtogetherbyhierarchicalclusteranalysisoftheprobabilitydistributionsusingtheKullback-Leiblerdivergenceasadistancemeasure.Thecellfrequencyateachcluster-maximadenotesthesizeofthesubpopulationthatharborstheclusteredmutations.Inthesecondstep,eachclusterisextendedbymemberswithsimilardistributionsinanintervalaroundthecluster-maxima.ValueSPsMatrixofpredictedsubpopulations.Eachrowcorrespondstoasubpopulationandeachcolumncontainsinformationaboutthatsubpopulation,suchasthesizeinthesequencedtumorbulk(columnMeanWeighted)andthenoisescoreatwhichthesubpopulationhasbeendetected(columnscore:lowervalues~highersubpopulationdetectioncondence). computeCellFrequencyDistributions13Author(s)NoemiAndorReferencesNoemiAndor,JulieHarness,SabineMueller,HansWernerMewesandClaudiaPetritsch.(2013)ExPANdS:ExpandingPloidyandAlleleFrequencyonNestedSubpopulations.Bioinformatics. computeCellFrequencyDistributionsGatheringofcellfrequencyprobabilitydistributions DescriptionComputestheprobabilitydistributionsofcellfrequencies,bycallingcellfrequency_pdfforeachmutationseparately.UsagecomputeCellFrequencyDistributions(dm,max_PM=6,p,min_CF=0.1,ploidy=2,nc=1,v=T)ArgumentsdmMatrixinwhicheachrowcorrespondstoamutation.Hastocontainatleastthefollowingcolumnnames:chr-thechromosomeonwhicheachmutationislocated;startpos-thepositionofeachmutation;AF_Tumor-theallele-frequencyofeachmutation;PN_B-thecountoftheB-alleleinnormalcells(binaryvariable:1ifthemuta-tionisagermlinevariant,0ifsomatic).max_PMUpperthresholdforthenumberofampliconspermutatedcell(default:6).Seealsocellfrequency_pdf.pPrecisionwithwhichsubpopulationsizeispredicted,asmallvaluereectsahighresolutionandcanleadtoahighernumberofpredictedsubpopulations.min_CFLowerboundaryfortheprevalenceofamutatedcell(default:0.1).ploidyThebackgroundploidyofthesequencedsample(default:2).Changingthevalueofthisparameterisnotrecommended.Dealingwithcelllinesortumorbiopsiesofveryhigh�(=0.95)tumorpurityisanecessarybutnotsufcientconditiontochangethevalueofthisparameter.ncThenumberofnodestobeforkedtorunRinparallel.vGiveamoreverboseoutput. 14gatherEXPANDSoutputValueListwiththreeelds:freqThecellularfrequenciesforwhichprobabilitiesarecomputed.densitiesMatrixinwhicheachrowcorrespondstoapointmutationandeachcolumncorrespondstoacellularfrequency.Eachvaluedensities[i;j]representstheprobabilitythatmutationiispresentinafractionfreq[j]ofcells.dmTheinputmatrixwithcolumnfupdatedaccordingtothecellularfrequencythatbestexplainstheobservedallelefrequencyandcopynumber.Author(s)NoemiAndor gatherEXPANDSoutputReadingEXPANDS'outputles DescriptionReadsEXPANDSoutputlesfromuser-speciedinputdirectory.UsagegatherEXPANDSoutput(outDirEXPANDS,regex="")ArgumentsoutDirEXPANDSAbsolutepathtoinputdirectoryinwhichEXPANDSresultsarestored,asgen-eratedbyrunExPANdS.regexPatternthatpathhastomatchinordertoberead.ValueTwo-levelnestedlist.TheouterlevelcontainsoneentrypereachoutputofrunExPANdS.Eachentryconstitutesaninnerlistwithelds:snv-TheassignmentofeachSNVtoasubpopulation.cbs-Thecopynumberofeachgenomicsegmentineachsubpopulation.spstats-Matrixofpredictedsubpopulations.tree-Theinferredphylogeneticrelationshipsbetweensubpopulationsasanobjectofclass"phylo4"(libraryphylobase).treeApe-Theinferredphylogeneticrelationshipsbetweensubpopulationsasanobjectofclass"phylo"(libraryape).Author(s)NoemiAndor plotSPs15SeeAlsorunExPANdS plotSPsSubpopulationVisualization DescriptionPlotscoexistentsubpopulationsdeterminedbyExPANdS.UsageplotSPs(dm,sampleID=NA,cex=0.5,legend="CN_Estimate",orderBy="chr",rawAF=F)ArgumentsdmMatrixinwhicheachrowcorrespondstoapointmutation(forexample,thema-trixoutputofassignMutations).Hastocontainatleastthefollowingcolumnnames:chr-thechromosomeonwhicheachmutationislocated;startpos-thegenomicpositionofeachmutation;AF_Tumor-theallele-frequencyofeachmutation;CN_Estimate-theabsolutecopynumberestimatedforeachsegment;PN_B-thecountoftheB-alleleinnormalcells(binaryvariable:1ifthemuta-tionisagermlinevariant,0ifsomatic);SP-thesubpopulationtowhicheachpointmutationhasbeenassigned(asfrac-tionofcellsinthetumorbulk);%maxP-thecondencewithwhichthemutationhasbeenassignedtothecor-respondingsubpopulation;SP_cnv-thesubpopulationtowhichtheCNVhasbeenassigned;PM-thetotalcountofallallelesatthemutatedgenomiclocus,insubpopulationSP.PM_cnv-thetotalcountofallallelesatthemutatedgenomiclocusinsubpop-ulationSP_cnv.PM_B-thecountofthemutatedalleleinsubpopulationSP.PM_B_cnv-thecountofthemutatedalleleinsubpopulationSP_cnv.sampleIDThenameofthesampleinwhichthemutationshavebeendetected.cexTheamountbywhichplottingtextandsymbolsshouldbemagniedrelativetothedefault.Seealsohelp(par).legendAllelefrequenciesandsubpopulationspeciccopynumbersarecoloredbasedonthechromosomeonwhichthemutationislocated(option:'chr')orbasedontheaveragecopynumberofthelocusinthesample(option:'CN_Estimate').orderByLociwithinasubpopulationaresortedbygenomiclocation(option'chr')orbythecondencewithwhichtheyhavebeenassignedtothesubpopulation(option'%maxP').rawAFSpecieswhethertheallelefrequencyofSNVsshouldbeadjustedrelativetotheassignedsubpopulation(options:true,false). 16roiValueForeachpointmutation(x-axis)thefunctiondisplays:-thesizeofthesubpopulationtowhichthemutationhasbeenassigned(squares).Eachsquareiscoloredbasedonthecondencewithwhichthemutationhasbeenassignedtothecorrespondingsubpopulation(black-highest,white-lowest).-thetotalcountofallallelesatthemutatedgenomiclocusinthatsubpopulation(dots).-onlyforlociwithanCNVandanSNVeachindistinctsubpopulations:thetotalcountofallallelesatthemutatedgenomiclocusinthesubpopulationwhichharborstheCNV(crosses).-theallelefrequencyofthemutation(stars-somaticmutations,triangles-lossofheterozygosity).Author(s)NoemiAndor roiRegionsofinterest DescriptionForinternaluseonly.Defaultregionalboundaryformutationsincludedduringclustering,compris-ingca.468MBcenteredonthehumanexome.Relevantifnumberofinputmutationsexceedsuserdenedthreshold(oftenappliestowholegenomesequencingdata).Asavedimageofthisobjectisinsysdata.rda.FormatNumericmatrixinwhicheachrowcorrespondstoagenomicsegment.Columns:chr-thechromosomeofthesegment;start-therstgenomicpositionofthesegment;end-thelastgenomicpositionofthesegment.SourceDataderivedfromhumanSureSelectExome_hg1950MBlibrarykitannotation.SeeAlsorunExPANdS runExPANdS17 runExPANdSMainFunction DescriptionGivenasetofmutations,ExPANdSpredictsthenumberofclonalexpansionsinatumor,thesizeoftheresultingsubpopulationsinthetumorbulkandwhichmutationsaccumulateinacellpriortoitsclonalexpansion.Input-parametersSNVandCBSholdthepathstotab-delimitedlescontainingthepointmutationsandthecopynumbersrespectively.AlternativelySNVandCBScanbereadintotheworkspaceandpassedtorunExPANdSasnumericmatrices.TherobustnessofthesubpopulationpredictionsbyExPANdSincreaseswiththenumberofmutationsprovided.ItisrecommendedthatSNVcontainsatleast200pointmutationstoobtainstableresults.UsagerunExPANdS(SNV,CBS,maxS=0.7,max_PM=6,min_CF=0.1,p=NA,ploidy=2,nc=1,plotF=2,snvF=NULL,maxN=8000,region=NA,verbose=T)ArgumentsSNVMatrixinwhicheachrowcorrespondstoapointmutation.Onlymutationslocatedonautosomesshouldbeincluded.ColumnsinSNVmustbelabeledandmustinclude:chr-thechromosomeonwhicheachmutationislocated;startpos-thegenomicpositionofeachmutation;AF_Tumor-theallele-frequencyofeachmutation;PN_B-countofB-alleleinnormalcells.Avalueof0indicatesthatthevarianthasonlybeendetectedinthetumorsample(i.e.somaticmutation).Avalueof1indicatesthatthevariantisalsopresentinthenormal(control)sample,albeitatreducedallelefrequency(i.e.thisisagermlinevariant,whichpassedthecallinglterduetothepresenceofanLOHevent).Mutations,forwhichtheallelefrequencyinthetumorsampleislowerthanthecorrespondingallelefrequencyinthenormalsample,shouldnotbeincluded.CBSMatrixinwhicheachrowcorrespondstoacopynumbersegment.CBSistyp-icallytheoutputofacircularbinarysegmentationalgorithm.ColumnsinCBSmustbelabeledandmustinclude:chr-chromosome;startpos-therstgenomicpositionofacopynumbersegment;endpos-thelastgenomicpositionofacopynumbersegment;CN_Estimate-theabsolutecopynumberestimatedforeachsegment.maxSUpperthresholdforthenoisescoreofsubpopulationdetection.Onlysubpopu-lationsidentiedatascorebelowmaxSarekept.max_PMUpperthresholdforthenumberofampliconspermutatedcell.Increasingthevalueofthisvariableisnotrecommendedunlessextensivedepthandbreadthofcoverageunderliethemeasurementsofcopynumbersandallelefrequencies.Seealsocellfrequency_pdf. 18runExPANdSmin_CFLowerboundaryforthecellularprevalenceintervalofamutatedcell.Muta-tionsforwhichallelefrequency*copynumberarebelowminCellFreq,areexcludedfromfurthercomputation.Decreasingthevalueofthisvariableisnotrecommendedunlessextensivedepthandbreadthofcoverageunderliethemea-surementsofcopynumbersandallelefrequencies.pPrecisionwithwhichsubpopulationsizeispredicted,asmallvaluereectsahighresolutionandcanleadtoahighernumberofpredictedsubpopulations.plotFOptionfordisplayingavisualrepresentationoftheidentiedsubpopulations(0-nodisplay;1-displaysubpopulationsize;2-displaysubpopulationsizeandphylogeny).snvFPrexofletowhichpredictedsubpopulationcompositionwillbesaved.De-fault:thenameofthelefromwhichmutationshavebeenreador"out.expands"ifinputmutationsarenothandedoveraslepath.maxNUpperlimitfornumberofpointmutationsusedduringclustering.IfnumberofusersuppliedpointmutationsexceedsmaxN,theclusteringofcellularfre-quencydistributionswillberestrictedtopointmutationsfoundwithinregion.regionRegionalboundaryformutationsincludedduringclustering.Matrixinwhicheachrowcorrespondstoagenomicsegment.Columnsmustinclude:chr-thechromosomeofthesegment;start-therstgenomicpositionofthesegment;end-thelastgenomicpositionofthesegment.Default:SureSelectExome_hg19,comprisingca.468MBcenteredonthehu-manexome.Alternativeusersuppliedregionsshouldalsobecodingregions,astheselectivepressureishigherascomparedtonon-codingregions.ploidyThebackgroundploidyofthesequencedsample(default:2).Changingthevalueofthisparameterisnotrecommended.Dealingwithcelllinesortumorbiopsiesofveryhigh�(=0.95)tumorpurityisanecessarybutnotsufcientconditiontochangethevalueofthisparameter.ncThenumberofnodestobeforkedtorunRinparallel.verboseGiveamoreverboseoutput.ValueListwithelds:finalSPsMatrixofpredictedsubpopulations.Eachrowcorrespondstoasubpopulationandeachcolumncontainsinformationaboutthatsubpopulation,suchasthesizeinthesequencedtumorbulk(columnMeanWeighted)andthenoisescoreatwhichthesubpopulationhasbeendetected(columnscore).dmMatrixcontainingtheinputmutationswithatleastsevenadditionalcolumns:SP-thesubpopulationtowhichthepointmutationhasbeenassigned;SP_cnv-thesubpopulationtowhichtheCNVhasbeenassigned(ifanCNVexistsatthislocus);%maxP-thecondenceofmutationassignment.f-Deprecated.Themaximumlikelihoodcellularprevalenceofthispointmuta-tion,beforeithasbeenassignedtoSP.Thisvalueisbasedonthecopynumber runExPANdS19andallelefrequencyofthemutationexclusivelyandisindependentofotherpointmutations.ColumnSPislesssensitivetonoiseandconsideredthemoreaccurateestimationofcellularmutationprevalence.PM-thetotalcountofallallelesinthesubpopulationharboringthepointmu-tation(SP).PM_B-thecountoftheB-alleleinthesubpopulationharboringthepointmu-tation(SP).PM_cnv-thetotalcountofallallelesinthesubpopulationharboringanCNV(SP_cnv).PM_B_cnv-thecountoftheB-allele,intheCNVharboringsubpopulation(SP_cnv).Ifphylogenyreconstructionwassuccessful,matrixincludesoneadditionalcol-umnforeachsubpopulationfromthephylogeny,indicatingwhetherornotthepointmutationispresentinthecorrespondingsubpopulation.densitiesMatrixasobtainedbycomputeCellFrequencyDistributions.Eachrowcor-respondstoamutationandeachcolumncorrespondstoacellularfrequency.Eachvaluedensities[i;j]representstheprobabilitythatmutationiispresentinafractionfofcells,wherefisgivenby:colnames(densities[;j]):sp_cbsMatrixasobtainedbyassignQuantityToSP.Eachrowcorrespondstoacopynumbersegment,e.g.asobtainedfromacircularbinarysegmentationalgo-rithm.Includesoneadditionalcolumnforeachpredictedsubpopulation,con-tainingthecopynumberofeachsegmentinthecorrespondingsubpopulation.treeAnobjectofclass"phylo"(libraryape)asobtainedbybuildPhylo.Containstheinferredphylogeneticrelationshipsbetweensubpopulations.Author(s)NoemiAndorReferencesNoemiAndor,JulieHarness,SabineMueller,HansWernerMewesandClaudiaPetritsch.(2013)ExPANdS:ExpandingPloidyandAlleleFrequencyonNestedSubpopulations.Bioinformatics.Examplesdata(snv);data(cbs);maxS=2.5;set.seed(4);idx=sample(1:nrow(snv),60,replace=FALSE);#out=runExPANdS(snv[idx,],cbs,maxS); 20snv simulationSimulatedheterogeneoussamples DescriptionAtotalof50sampleswithvariousnumbersofsubpopulationspersampleweresimulatedatvariablenoiseratesandconstantnumberof200mutationspersample.Usagedata(simulation)FormatListwith50entries-onepersimulatedsample.Subpopulationcompositioncanbepredictedforeachsampleandthepredictionscomparedtothethesimulatedentries:snv-thematrixofsimulatedpointmutations(includinggroundtruthcolumnsSP*,PM*).cbs-thematrixofsimulatedcopynumbersegments(includinggroundtruthcolumnsSP*).spstats-matrixofsubpopulationstatistics(groundtruth).Examplesdata(simulation)snvcols=c("chr","startpos","CN_Estimate","AF_Tumor","PN_B")cbscols=c("chr","startpos","endpos")sI=1:50;#setto1:200torunonentiresimulation#out=runExPANdS(simulation[[1]]$snv[sI,snvcols],simulation[[1]]$cbs[,cbscols],plotF=0);#truePhy=buildPhylo(simulation[[1]]$cbs,outF='truePhylo');##simulated#predPhy=buildPhylo(out$sp_cbs,outF='truePhylo');##predicted#par(mfrow=c(1,2))#plot(truePhy$tree,cex=2,main='simulated')#plot(predPhy$tree,cex=2,main='predicted') snvSingleNucleotideVariations DescriptionSomaticmutationsandLossofHeterozygosity(LOH)ofaGlioblastomatumor(TCGA-06-0152-01).Usagedata(snv) snv21FormatNumericmatrixwith773rows(onepermutation)and7columns:chr-thechromosomestartpos-genomicpositionendpos-sameasaboveREF-ASCIIcodeofthereferencenucleotide(inhg18/hg19)ALT-ASCIIcodeoftheB-allelenucleotideAF_Tumor-allelefrequencyofB-allelePN_B-countofB-alleleinnormalcells.Avalueof0indicatesthatthemutationhasonlybeendetectedinthetumorsample(i.e.somaticmutations).Avalueof1indicatesthatthevariantisalsopresentinthenormal(control)sample,albeitatreducedallelefrequency(i.e.thisisagermlinevariant,whichpassedthecallinglterduetothepresenceofanLOHevent).Othermutationsshouldnotbeincluded.SourceDataderivedfromTheCancerGenomeAtlas(TCGA). IndexTopicdatasetscbs,9roi,16simulation,20snv,20assignMutations,2,15assignQuantityToMutation,4,7assignQuantityToSP,5,7–9,19buildMultiSamplePhylo,6buildPhylo,3,6,7,8,19cbs,6,9cellfrequency_pdf,3,10,13,17clusterCellFrequencies,3,4,12computeCellFrequencyDistributions,12,13,19gatherEXPANDSoutput,14plotSPs,15roi,16runExPANdS,6,7,14–16,17simulation,20snv,2022