/
The Encyclopedia of DNA Elements (ENCODE) ProjectElise A. Feingold, Ph The Encyclopedia of DNA Elements (ENCODE) ProjectElise A. Feingold, Ph

The Encyclopedia of DNA Elements (ENCODE) ProjectElise A. Feingold, Ph - PDF document

myesha-ticknor
myesha-ticknor . @myesha-ticknor
Follow
392 views
Uploaded On 2016-03-20

The Encyclopedia of DNA Elements (ENCODE) ProjectElise A. Feingold, Ph - PPT Presentation

How can we 147read148 thehuman genome sequenceGenetic code but no genomic codeEvolutionary conservation helps to identify functionally important regions5 conserved 15 protein codingWhat ID: 263578

How can “read” thehuman

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "The Encyclopedia of DNA Elements (ENCODE..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

The Encyclopedia of DNA Elements (ENCODE) ProjectElise A. Feingold, Ph.D.National Human Genome Research InstituteNational Institutes of HealthAgENCODEWorkshopJanuary 10, 2014 How can we “read” thehuman genome sequence?Genetic code, but no genomic codeEvolutionary conservation helps to identify functionally important regions~5% conserved/ ~1.5% protein codingWhat is function of noncoding conserved sequences?What is function of nonconserved sequences?Moderately good at identifying proteinregions, but fine structures difficult to predict from Regulatory regions can be very far away from genesNeed unbiased experimental investigation ENCODE: Encyclopedia f lements Compile a comprehensive encyclopediaof all sequence features in the human genome and in the genomes of selected model organisms ApproachApply lessons learned from the success of the Human Genome ProjectStart with welldefined pilot project Develop and test highthroughput technologies Community ResourcesUse by research community to enhanceunderstanding of:regulation of gene expression on a spatial, temporal and quantitative levelgenetic basis of diseaseRapid prepublication data releaseConsortia publicationsAnalysis requires development of:Common data reporting formatsData standards Analytical tools ENCODE Timeline ENCODE Products “Marker” Papers � PLoSBiol (2011) 9:e1001046 modENCODE Publications 19 companion papers in Nature, Genome Research, Genome Biology and Database ENCODE 2 Publications September 2012 modENCODE andENCODE 2 Final Efforts odENCODECrossspecies AnalysesTranscriptionChromatinRegulationTransfer of data and analyses to ENCODE 3 ENCODE “2”Mouse ENCODE Crossspecies AnalysesTransfer of data and analyses to ENCODE 3 ENCODE DataModified from PLoSBiole1001046,2011 ENCODE 2 Data Human Data �2,800 Datasets �200 Cell types�250 RNAseq150 DNase1,100 Transcription factor binding�200 Histone modificationDNAmeGENCODE mRNAFunctional Characterization Mouse Data �600 Datasets 100 Cell types100 RNAseq50 DNase170 Transcription factor binding170 Histone modification Cells 182 cell Lines/ Tissues ENCODE Dimensions Methods/Factors 164 Assays (114 different Chip)3,010 Experiments5 TeraBases1716x of the Human GenomeEwan Birney More than 30 papers in NatureGenome ResearchGenome BiologyScienceCellPublishing innovationsThreads of themesVirtual machinesiPadappENCODE increased our understanding of noncoding DNA and human diseaseENCODE 2 PublicationsFrom www.nature.com/encode HighLevel FindingsVery large fraction of the genome is biochemically active80% of the genome has an ENCODE annotation in at least one cell typeFraction that are functional TBDGWAS SNPs are enriched within noncoding functional elements�50% of noncoding GWAS SNPs are near ENCODEdefined regionsIn many cases, disease phenotypes can be associated with a specific cell type or transcription factor.Segmenting the genome into 7 chromatin states predicts ~400,000 enhancers and ~70,000 promoters as well as 1000s of quiescent states Noncoding DNA Is Important For Disease And EvolutionNoncoding DNA variants are known to cause human diseasesNoncoding variants are known to cause changes in drug metabolismAbout 90% of GWAS findings lie outside of proteinMore than 80% of recent adaptation signatures in three recent studies are not associated with proteincoding mutationsStamatoyannopoulos, Science 3371190, 2012Kingsley, Nature 48455,2012; Sabeti, Cell 152703,2013; Fraser, Genome Research, doi:10.1101/gr.152710.112,2013 Data Access Data Accesswww.encodeproject.org UCSC Genome BrowserEnsemblwwww.modENCODE.orgNCBIFlyBaseWormBase ENCODE Portalhttp://encodeproject.org Displaying data from ENCODE portal http://encodeproject.org ENCODE Experiment Matrix http://encodeproject.org ENCODE Data Standards http://encodeproject.org ENCODE Software Tools http://encodeproject.org Publications http://encodeproject.org 100200300400500600 Number of PublicationsCumulative ENCODE Publications Over Time Papers from Non-ENCODE Authors Papers from ENCODE 2 Production Groups 100120140160 Number of PublicationsCumulative Publications Using ENCODE Data by NonENCODE Authors Basic Biology Methods Development Human Disease Use of ENCODE Data in Linking Genotype to PhenotypeENCODE data can be used in hypothesis generation and refinementWhat is the causal variant?What is the target gene?What is the target cell type?How does the variant alter the phenotype? Social MediaFacebookENCODE (ENCyclopediaOf DNA Elements)Twitter@ENCODE_NIH ENCODE Tutorial Pageshttp://www.genome.gov/27553900 ENCODE 3 Catalog is incompleteOnly a small fraction of transcription factors studiedDeeper analysis across many additional cell types (more primary cells) neededAdditional data types need to be studied, e.g., RNAbinding proteins, lncRNAs ENCODE 3 Solicitation Comprehensive catalogs of functional elementsExisting capacity for highthroughput, efficient productionCentralized production, management & coordination7 high priority scientific areasMore integrated data coordination and analysisPrimary focus on human, secondary focus on mouseFly/worm allowed if demonstrate need for:highly centralized effort for specific data type Work to be undertaken as part of highly interactive consortium Priority areasMaps of all classes of functional RNA molecules Fine structural genome annotation (of the human and mouse genomes only) by improving gene models Maps of sites of open chromatin Maps of selected histone marks and other relevant chromatin proteins Maps of sites of DNA methylation Maps of all functional sequence elements within RNA molecules Maps of the binding sites for more transcription factors, using a minimum of two cell types for each previously unstudied factor, and additional, well justified cell types as resources permitFor transcription factors for which binding site maps already exist, development of maps in additional cell types will be considered, but will be of lower priority and expansion of this data set must be strongly justified 3 Structure GeneModelsTFBindingData CoordinationCenterData Analysis CenterAnalysis Working Group ElementChromatinStatesHistoneDNase DNAme RBPBinding Computational AnalysisGroups Technology Development Groups Data Production Groups Project Management Project ManagementMonthly teleconference callsWorking groups to address specific issuesData Analysis Working GroupsAnnual meetings Project oversight by external advisors Individual Project ManagementYearly quantitative milestonesQuarterly progress reportsTrack status of experiments and data submission to identify bottlenecksTrack costsAdditional narrative section to track nonquantitative milestones, e.g., technology development and to discuss bottlenecks Participants Groups funded by ENCODE solicitationsOpen to additional data production or data agreeing to criteria for participationwide analysisFull participation in Consortium activitiesAbide by data release policyDemonstrated funding sourceEncourage interconsortia collaborationsEncourage other collaborations/coordination Peak Calling ChIP/CLIP/RIPseq Human Subjects Operational ENCODE Consortium Activities Human Resources Policies/LogisticsMouse Resources Data Release and Publications Outreach Functional Characterization and Validation Data Coordination, Analysis, and InterpretationAnalysis Working Group DatatypeSpecific CoordinationDNase RNA Binding DAC EDCAC Consortium Production PI ENCODE Wiki Nature 48949,2012 Lessons LearnedPlan data collectionDevelop focused project goals and target end users in advanceEmploy highthroughput, robust methodsKeep production and technology development pipelines separateCentralize data collection to the extent possible to maximize economies of scale and consistent data qualityGenerate data on common samples to the extent possibleConsider centralized sample collection/distributionVery powerful to have multiple data types on same samplesDevelop metadata useful for people outside of projectDevelop experimental standards, data quality metrics and uniform data processingEspecially needed if multiple groups are generating data using same experimental assaysEnsure high (known) data quality Perform data quality evaluation on ongoing basis Lessons LearnedDevote sufficient resources to bioinformatics (data storage, processing and analysis)Don’t assume that organism specific community will come together on its own for analysis without dedicated supportBe realistic about data analysis and publication timeline Overestimate by at least 2XCreate centralized mode of sharing informatione.g, wiki sites, googledocs Lessons LearnedNeed for significant, centralized managementExplicit, written guidelines, standards and rulese.g., policies for data release, publicationsBalance needs of individual investigators with those of ConsortiumRetain ability to publish independently Focus on global data production and analysisBeware of focus on individual research agendas and “interesting biology”Foster collegial interactionsEncourage diversity of opinionsKeep consortium open and bring in needed expertiseAvoid “group think” Have explicit process for decision making SummarySet clear goals, articulate to communityMaximize utility of data to the communityRapid prepublication data releaseHigh (knowable) data qualityData standardsInteroperability with other projects, especially metadataTake advantage of highthroughput production capabilities to maximize economies of scaleOpen consortiumSet and monitor production milestonesFacilitate communication between data production groups and computational analysisDevote sufficient resources (data production, analysis and infrastructure) AgENCODEConsiderationsFocused goalsNumber of speciesQuality of genome sequenceNumber of individuals per speciesNumber of phenotypesNumber of tissues/cell types ENCODE Production Centers Bradley Bernstein (John Rinn, Manolis Kellis)Thomas Gingeras (Carrie Davis, Roderic Guigo)Brenton Graveley (Christopher Burge, XiangDong Fu, Eugene Yeo)Richard Myers (Devin Absher, Gregory Cooper, Shawn Levy, FlorenciaPauli Behn, Ross Hardison, Ali Mortazavi, Timothy Reddy, Barbara Wold)Bing Ren (Joseph Ecker, Len Pennacchio, Axel Visel, Wei Wang)Michael Snyder (Kevin White, Sherman Weissman, Peggy Farnham)John Stamatoyannopoulos (Ralph Hansen, Rajinder Kaul, Patrick Navas, George Stamatoyannopoulos, Piper Treuting, Michael Bender, Job Dekker, Mark GroudineENCODE Data Coordination Center Mike Cherry (Jim KentENCODE Data Analysis Center Zhiping Weng (Mark Gerstein, Manolis Kellis, Roderic Guigo, Rafael Irizarry, Xiaole Shirley Liu, William Stafford NobleAdditional ENCODE Participants Timothy Hubbard (Mark Gerstein, Roderic Guigo, Jen Harrow, Rachel Harte, David Haussler, Manolis Kellis, AlexandreReymond, Stephen Searle, Alfonso Valencia)David Gilbert (Tamer Kahveci ENCODE 3 ENCODE Computational Analysis Groups Peter Bickel HaiyanHuang, Leonard Lipovich, Bin Yu)David Gifford TommiJaakkolaSunduz Keles (Emery Bresnick, Colin Dewey)Robert Klein (Christina Leslie, Souma Raychaudhuri, Ross Levine, Kenneth OffitJonathan Pritchard YoavGiladXinshu XiaoENCODE Technology Development Groups Christopher Burge (Wendy Gilbert, Brenton Graveley, Robert Horvitz)Barak Cohen and Joseph CorboPeggy Farnham (Victor Jin, David Jay Segal)R. David Hawkins Christina Leslie (Christopher Mason)Jason Lieb (Karen Mohlke, EranSegal)Mats Ljungman (Thomas Wilson)Tarjei MikkelsenJay Shendure and Nadav Ahituv (Michael McManus)Alexey WolfsonCheng Yuan (Stuart Orkin … and many senior scientists, postdocs, students, technicians, computer scientists, statisticians and administrators in these groupsCurrent ENCODE participants: http://www.genome.gov/26525220 The ENCODE 3 Consortium NHGRI Staff Program Directors Elise FeingoldPeter GoodMichael Pazin Deputy Director Mark Guyer Division Director Jeff Schloss Program Analysts Sherry ZhouPreetha Nandi