4 th December 2013 1 Understanding the Human Genome Lessons from the ENCODE project Austen Ganley INMS Glossary Genome Genes DNARNA Protein Cell Transcription Chromatin Histones Nucleosomes ID: 930677
Download Presentation The PPT/PDF document "University of Brawijaya" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
University of
Brawijaya
4th December 2013
1
Understanding the Human Genome: Lessons from the ENCODE project
Austen Ganley
INMS
Slide2Slide3Glossary
Genome
Genes
DNA/RNA
Protein
CellTranscription
Chromatin
Histones
Nucleosomes
Non-coding RNA
Sequencing
Microarray
Transcription start site
Active/open
Inactive/repression
Slide4promoter
t
ranscriptional start site
exon
intron
t
ranscriptional terminator
Slide5Introduction
Individual scientists worked together
Aim was to understand 1% of the human genome (2007), and 100% (2012)
Looked at:
Transcription
Chromatin/transcription factorsReplication
Evolution
Slide6Genes
Now estimated to be about
21,000 protein-coding genes
(taking about 3% of the whole genome)
In addition, there are about 9,000 microRNAs
, and about 10,000 long non-coding RNAs
Slide7Transcription
Transcription was measured by two different methods:
Whole genome microarrays
RNA-sequencing
Slide8Detecting transcription using tiled microarrays
Slide9Transcription
Transcription was measured by two different methods:
Whole genome microarrays
RNA-sequencing
They found at least
62% of the whole genome is transcribed (remember, genes only account for about 3% of the whole genome)
Slide10Transcriptional start sites
Goal is to identify the transcription start sites
N
ot easy to do!
Use a technique called CAGE (
Cap Analysis Gene
E
xpression)
Slide11CAGE
Makes use of the
5’ CAP
on mRNA
First, mRNA is reverse-transcribed, to form cDNA (RNA-DNA hybrid)
Then, biotin is attached to the 5’ CAP, and the cDNA is fragmentedThe biotin fragments are isolated (representing the 5’ end of mRNA), and these fragments are sequenced
Slide12About
60,000
transcription start sites found
Only
half of these match known genes
What do the other ones do? May explain high level of transcriptionThe transcription start sites are often far upstream of the gene start, and can overlap genes
Slide13Overlapping Genes
Transcriptional start sites from
the
DONSON
gene
A
n overlapping gene, starting far upstream
The
DONSON
gene is a known gene
However, some transcripts start in the
ATP50
gene, and include some
ATP50
exons
Two genes are skipped out
Slide14Nucleosomes are formed from DNA that is packaged around histones
Histones are a set of proteins that usually associate as an
octamer
www.palaeos.com/Eukarya/Eukarya.Origins.5.html
www.mun.ca/biochem/courses/3107/Topics/supercoiling.html
Chromatin: histones and nucleosomes
Slide15Dnase
I hypersensitive sites (DHS)
Gilbert,
Developmental Biology
, Sinauer
Hebbes Lab, University of Portsmouth, UK
DNase
I preferentially digests nucleosome-depleted
regions (
DNase
I hypersensitive
sites)
These
are associated with
gene transcription
Chromatin is digested with
DNase
I: only digests nucleosome-free regions
The remaining
DNA is isolated, and put on a
microarray or sequenced
Find the open, active regions of the genome
Slide16DNase
I hypersensitive sites
In total, about
3 million
DNase I hypersensitive sites in the genome, covering about
15% (versus about 40,000 genes covering about 4%)Transcriptional start sites are regions of DNase
I hypersensitivity, as expected
Most
DNase
I hypersensitive sites are not associated with transcriptional start site, though
Slide17Genome
Transcribed region
DNase
I hypersensitive region
Transcription start sites
Genes
Slide18Histone Modification Effects
Modifications occur on the histone tails
They alter the strength of DNA-histone binding, and influence the binding of other proteins to the DNA
Thus they can activate or silence gene expression
Slide19The “Histone Code”
The combination of histone modifications determine
a gene’s transcriptional
status – histone code
Some modifications are associated with active gene
expressionH3K4me2H3K4me3
H3ac
H4ac
Some with repression
H3K27me3
H3K4me1
www.nature.com
/
nrm
/
index.html
Slide20ChIP (Chromatin immunoprecipitation)
Method to find where your protein of interest
is binding to
You cross-link the sample, and fragment the DNA into pieces
Immunoprecipitate
using an antibody to your protein of interest
Reverse the cross-links, and isolate the DNA
To find where in the genome the protein was bound:
Hybridise
the DNA to a
microarray (
ChIP-chip
) OR sequence it (ChIP-seq)
www.rndsystems.com
/
product_detail_objectname_exactachip_assayprinciple.aspx
Slide21Histone modification profiles
They found that histone modifications associated with
active transcription
were found around transcription start sites
They found that histone modifications associated with
gene repression were depleted around transcription start sitesThis is as expected
Around
DNase
I hypersensitive sites not near transcription start sites, they found almost the opposite pattern
Slide22Enrichment of active histone marks and depletion of inactive histone marks at a transcription start site
Enrichment of inactive histone marks but little enrichment of active histone marks at a
DNase
I hypersensitive site
Slide23Histone modification profiles
They also found other patterns
Combining all the results (plus results for transcription factor binding), they say that the human genome is divided into
seven different types of chromatin states
Which state it is depends on what combination of histone modifications/transcription factor binding there is
Slide24The seven chromatin states
Slide25The seven chromatin states
Promoter (red)
Enhancer (yellow)
Gene body (green)
Inactive region (grey)
Slide26Grand Summary
ENCODE
Transcription:
• a lot of non
-coding
transcription
(~60
% of the
genome transcribed) – much more than needed just to transcribe all the genes
Transcription start sites:
•
Twice as many transcription start sites as
traditional “genes
”
• transcripts span large regions, even between
genes
DNase
I hypersensitive sites:
•
more than just at transcription start sites
• two types: those
found both at TSS, and
those found at
other regions
• these have
different chromatin profiles
Histone modifications:
• active marks correlate with TSS/DHS
• distal DHS have a different histone modification profile
Chromatin states:
•
The genome can be divided into seven different types
•
these are determined by the combination of histone modifications and transcription factor binding that occur
Overview:
• genome can be generalised into
seven different states
•
the function of some of these states is known – e.g. promoter
•
the function of others is not known, but may explain the high level of transcription and open chromatin structure