Garrett Jenkinson PhD Data Scientist Assistant Professor of Biomedical Informatics Division of Computational Biology Department of Quantitative Health Sciences Genetics versus Epigenetics Which is more different at cellular level phenotypically ID: 930890
Download Presentation The PPT/PDF document "Methylation for Clinical Diagnosis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Methylation for Clinical Diagnosis
Garrett Jenkinson, PhD
Data Scientist, Assistant Professor of Biomedical Informatics
Division of Computational Biology
Department of Quantitative Health Sciences
Slide2Genetics versus Epigenetics
Which is more different at cellular level phenotypically?
Your heart cells from your brain cells
A monkey’s heart cells from your heart cellsHow about the old nature versus nurture question?Two healthy but unrelated peoples’ liversIdentical twins’ livers when only one is alcoholicRegulation of gene expression can be as critical as underlying genomic sequenceA gene can be turned off by regulation which can be functionally the same as obliteration of the genomic sequenceNot all regulation is epigenetic and causality rarely understood…don’t be oversold
2
Slide3DNA methylation is crucial if you want to understand:Developmental biology, stem cells, differentiation
Carcinogenesis, imprinting disorders
Aging, environmental exposures
Motivation
3
Slide4Biology of DNA methylation in mammals
Covalent addition of a methyl group to the 5’ carbon of cytosine residues (5mC)
Predominantly at CG dinucleotides (
CpG sites)
4
Slide5CpG sites
The “p” represents phosphate backbone to distinguish between
CpG
and C—G hydrogen bonding between the strands of DNAOnly positions in human genome with known mechanisms for epigenetic inheritance past cell division (DNMT enzymes)Dense regions of CpG sites referred to as CpG islands which are flanked by shores, shelves and then CpG depleted open seas
Methylated islands in promoters linked to repressed gene expressionMethylation has complicated relationships to chromatin structure and gene expressionMechanistic understanding of DNA methylation in gene regulation is incomplete
5
Slide6Agouti Mouse Model
Genetically identical, phenotype differences driven by difference in methylation at agouti gene
Expose pregnant mice to bisphenol A (BPA in plastic products)
Disproportionate number of yellow, obese progeny than would normally be expected DNA methylation at the agouti gene sites is decreased (hypomethlyated)Need sequencing methods to probe the state of DNA methylation
6
Slide7Detailed View of Bisulfite Sequencing
https://software.broadinstitute.org/software/igv/sites/cancerinformatics.org.igv/files/SL_IGV_bisulfiteflow2.png
7
Slide8QC and Alignment of BS-seq data
Need specialized algorithms/tools to deal with “heavily mutated” BS-seq data
trimgalore
! is a package that wraps cutadapt and allows for the trimming of low quality bases and adapters from sequencing readsBismark is a bisulfite-aware aligner using bowtie2 Can also produce QC and methylation summarization information
8
Slide9Post-alignment Data in IGV
9
Slide10Common BS-seq methods
WGBS completely unfocused
Comprehensive ~13 million
CpG sites profiledGold standard~$5K per sampleRRBS, 1% of genome with 1.5 million CpGs Most common BS-seqRestriction enzymes chop DNA and results in enrichment for CGIs~$500 per sample“Capture” protocols (e.g., EPIC
TruSeq), 3 million CpGsLeast commonLooks more like “focused” WGBS
10
Slide11“Raw”
Data
11
Slide12Methylation status not as “fixed” as genetic
Populations of genetically homogeneous cells can and do differ in methylation
Maternal and paternal alleles can and do differ (e.g., imprinting)
At a given time, each cell’s DNA is either methylated (1) or unmethylated (0), but state can change during life of cellEnd result: we talk of probability that a CpG site is methylated in a given tissue/sequencing run
12
Slide13Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n=1]
13
Slide14Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
14
Slide15Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
15
Slide16Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
16
Slide17Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
17
Slide18Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
18
Slide19Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
19
Slide20Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
20
Slide21Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
21
Slide22Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
22
Slide23Marginal
Estimation
X
n
= 1 if nth site methylated
X
n
= 0 if unmethylated
P
n
(1) =
Pr
[
X
n
=1]
23
Slide24Smoothed
Marginals
Use smoothing to improve marginal estimates
raw estimates:
smoothed estimates:
24
Slide25Smoothed
Marginals
raw estimates:
smoothed estimates:
25
Slide26Smoothed
Marginals
raw estimates:
smoothed estimates:
26
Slide27Smoothed
Marginals
raw estimates:
smoothed estimates:
27
Slide28Smoothed
Marginals
raw estimates:
smoothed estimates:
28
Slide29Smoothed
Marginals
raw estimates:
smoothed estimates:
29
Slide30Smoothed
Marginals
raw estimates:
smoothed estimates:
30
Slide31Smoothed
Marginals
raw estimates:
smoothed estimates:
31
Slide32Smoothed
Marginals
raw estimates:
smoothed estimates:
32
Slide33Smoothed
Marginals
raw estimates:
smoothed estimates:
33
Slide34Smoothed
Marginals
raw estimates:
smoothed estimates:
34
Slide35“Marginal probabilities” from other tech
A lower-cost approach to estimating these probabilities is to use 450K or 850K microarrays
Concept the same: detect conversion or non-conversion SNV’s introduced by
bisfulfiteUse array bioinformatics methods like ChAMP to produce “beta values” which are qualitatively equivalent to marginal probabilities of methylation at a CpGMature technology/pipelines Lowest cost approach is MLPA (not bisulfite)Only able to profiles dozens of targeted CpGs, but widespread clinical usage
35
Slide36Detect hypo- or hyper-methylation at disease loci
Example disorder: Prader-Willi/Angelman Syndrome
PWS - loss of function of paternal genes
AS – loss of maternally expressed UBE3ABoth can present as loss of imprinting (50% prob of methylation) in specific loci on chr15Example disorder: Fragile X SyndromeCGG trinucleotide repeat expansion in 5’UTR of FMR1 geneHypermethylation in FMR1 promoter of males can diagnose
36
Slide37Tumor methylation signatures can have clinical decision-making value
Glioblastoma has very poor prognosis
Temozolomide is alkylating agent used as chemotherapy to induce damage to tumor cells’ DNA
MGMT is a DNA repair enzyme which can inhibit efficacy of such therapiesPatients with hypermethylated MGMT promoter are more responsive to these therapeutics
37
Slide38Methylation signatures: viewing marginal probabilities as a vector
Stack p
1
, p2, …, pN into a vector p for each sampleSuppose samples have a label yy ∈ {0,1} for normal, diseased
y ∈ {1,2,3,4,…,D} for D different disease states or phenotypesBuild a machine learning classifier f that takes in p to predict y ≈ f(p)f() estimated using SVMs, Random Forests, Neural networks, etc.
38
Slide39Genome-wide classifiers for complex developmental disorders
39
Slide40Diagnoses of dozes brain tumors assisted by
DNAm
classifier
40
Slide41The frontier: methylation as a coordinated phenomena
Rarely do we care about methylation for a single
CpG
site…often care about entire island’s coordinated behaviorTo the extent people care about single sites, it is due to the highly correlated/coordinated behaviors of site with neighbors“Marginal” view of methylation as a probability at each site is inadequate to capture the richness and diversity of the underlying biology41
Slide42Stochasticity: Epipolymorphism/Entropy
Landan et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in
normal and cancerous tissues. Nat. Gen. 2012
42
Slide43Joint Probability Distributions
Need to talk about probabilities of
patterns
of CpG sitesFrom such probabilities, any other quantity of interest is availableEpipolymorphismEntropyNow possible to detect not just hypo- or hyper- methylation changes in the mean, but any difference in methylation behavior
43
Slide44Empirical
Estimation
44
Slide45Empirical
Estimation
45
Slide46Pattern Probability = 1/10
Pattern Probability = 2/10
Pattern Probability = 1/10
Pattern Probability = 1/10
Pattern Probability = 1/10
Pattern Probability = 1/10
Pattern Probability = 3/10
The 1017 other patterns are assigned zero probability
46
Slide47Ising
model
47
Each read is a single-cell measurement even in bulk sequencingMeans and nearest-neighbor correlations frequently observed
1D Ising model is MaxEnt model consistent with these quantitiesWell studied model in statistical physics with many existing computational techniques/resultsProvides full joint distribution
Slide48Ising model performance
Empirical and marginal methods under- and over- estimate heterogeneity
Ising is accurate even in low data
Slide49Ising model specification
All patterns have non-zero probability
General model requires estimation of a
n
and c
n parameters; (2N-1) << 2N
Improve performance further by imposing parametric structure based on the biology
Normalized Methylation Entropy
Rigorously quantifies stochasticity in DNA methylation using Shannon entropy
Another degree-of-freedom compared to standard mean analyses
Shown to have
discriminatory power in aging, carcinogenesis
and stem cell differentiation
Slide5151
Jensen Shannon distance
Slide52Information Theoretic Bioinformatics Software
informME
is an information theoretic package designed to implement the
Ising model, NME, JSDAvailable as a thoroughly used/tested matlab/C++ code base, with bash wrappers and SLURM/SGE submission scriptsOr recently informME.jl is released as a trial package in julia language requiring no licensing or complex pipelines
52
Slide53AML demonstrates hyper-entropic and hypo-methylation signatures
53
Slide54Dysregulation of epigenome in ALL
54
Slide55UHRF1 identified as epigenetic regulator in ALL, linking translocation subtypes
55
Slide56Single cell RNA seq confirms central role with translocation driver genes
56
Slide57Example Application
57
Slide58Highlights of DNA methylation in twins study
Twin astronauts with similar past flight experience studied in detail during longest American spaceflight in history
Surprising result that space twin globally had less DNA-methylation variability than ground twin; hypotheses why?
58
Slide59Focal changes in DNA methylation
Less surprising results when looking for focal genes with DNA methylation differences:
Regulation of ossification, and cellular response to ultraviolet-B (UV-B), platelet aggregation
Somatostatin signaling pathway and regulation of superoxide anion generationResponse to platelet-derived growth factor (PDGF) and T cell differentiation and activation pathways 59
Slide60Example Detailed Analysis of NOTCH3
60
Slide61Papers for more detail or applications
61
Slide62Questions?
62