Vladimir Teif Intro to NGS analysis Proficio course 2020 NGS techniques vs NGS applications NGS techniques how to sequence DNA or RNA covered in lecture 1 funny recap in this video ID: 933478
Download Presentation The PPT/PDF document "NGS applications and data analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
NGS applications and data analysis
Vladimir Teif
Intro to NGS analysis
Proficio
course 2020
Slide2NGS techniques vs. NGS applications
NGS techniques: how
to sequence DNA (or RNA)
(covered in lecture 1; funny recap in this video
https://www.youtube.com/watch?v=-
7GK1HXwCtE
)
NGS applications
:
how
to design experiments in order to answer
a specific biological question
Slide3Examples of NGS applications
Figure adapted from
http://www.scienceinschool.org
Hi-C
Chromatin
domains
Slide4Types of NGS applications
RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq
gene expression; non-coding RNA
ChIP-seq, MNase-seq,
DNase
-seq, ATAC-se, etc
protein binding; histone modifications
chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation)
Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops) Amplicon sequencing targeted regions; philogenomics; metagenomics
Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses)
Curated bibliography of *seq methods (~100 methods)can be found at https://liorpachter.wordpress.com/seq/
Slide5RNA-seq (RNA sequencing)
https://en.wikipedia.org/wiki/RNA-Seq
Slide6ChIP-seq (
Ch
romatin
I
mmuno
p
recipitation followed by
seq
uencing)
1. Crosslink Protein-DNA complexes
in situ
2. Isolate nuclei and fragment DNA (sonication or digestion)
3.
Immunoprecipitate
with antibody
against target nuclear protein
and reverse crosslinks
4. Release DNA and submit for sequencing
Adapted from www.VisiScience.com
Slide7MM
MNase
-seq (
M
icrococcal
N
uclease digestion followed by sequencing)Teif et al. (2012),
Methods, 62, 26-38MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes)
Slide8FAIRE-seq (
F
ormaldehyde-
A
ssisted
I
solation of
Regulatory Elements)
sequencing
Giresi et al (2007),
Genome Res. 17, 877–885
Slide9DNAse
-seq (
DNase
I
digestion
followed by
seq
uencingWang et al. (2012), PLoS
ONE 7, e42414
Slide10How transposase works:
https://www.youtube.com/watch?v=XYZHMGUGq6o
ATAC-seq (
A
ssay
for
T
ransposase
-Accessible Chromatin using sequencing) Buenrostro et al. (2013) Nat Methods.
10, 1213-1218
Slide11MM
Methods for 1D genome mapping
Meyer & Liu,
Nature Reviews Genetics
15
, 709–721 (
2014
)
Slide12Methods for 1D genome mapping
Tsompana
and
Buck,
Epigenetics
& Chromatin
20147:33
Slide13NGS methods for DNA methylation
Bisulfite
sequencing
Affinity purification (e.g.
MeDIP
)
Slide14Chromatin Conformation Capture methods to map locations of DNA-DNA loops
Rao et al.,
Nature
159
,
1665–1680 (2014)
Slide15River and Ren (2013),
Cell
,
155
, 39-55
Since 2017 DNA loops can be measured with 100-bp resolution (
Bonev
et al.,
Cell, 2017)
Slide16Timeline of NGS methods
River and Ren (2013), Cell,
155
, 39-55
Hu et al,
Front
. Cell Dev. Biol
., 2018
Bulk methods that require many cells
Single-cell methods
Slide17Where to get NGS data?
Do your own experiment
Gene Expression Omnibus (GEO)
https://www.ncbi.nlm.nih.gov/geo
Sequence read archive (SRA)
https://www.ncbi.nlm.nih.gov/sra
European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA)
https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/
You also have to upload your data!
Slide18Next generation sequencing analysis
Slide19How to analyze NGS data?
Ask a bioinformatician
you need to explain what do you want, and for that you need to understand what/how can be done
Do it yourself
Command line –> become a
bioinformatician
Online wrappers –> simpler, but file size limitsExample of a convenient online tool:
Galaxy http://galaxy.essex.ac.uk/
Slide20ChIP-seq (
Ch
romatin
I
mmuno
P
recipitation
followed by
seq
uencing)
1. Crosslink Protein-DNA complexes
in situ
2. Isolate nuclei and fragment DNA (sonication or digestion)
3.
Immunoprecipitate
with antibody
against target nuclear protein
and reverse crosslinks
4. Release DNA and submit for sequencing
Adapted from www.VisiScience.com
Slide21Experiment
Data
analysis
http://www4.utsouthwestern.edu/mcdermottlab/NGS/index.html
Slide22ChIP-seq data analysis
www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png
Slide23Unmapped sequenced reads(this is “raw”, primary data):
Slide24Mapped reads are characterised
by their locations in the genome
Bowtie,
BWA, ELAND,
Novoalign
,
BLAST, ClustalWTopHat (for RNA-seq)
Slide25Reads can align to overlapping locations
http://biocluster.ucr.edu/~rkaundal/workshops/R_feb2016/ChIPseq/ChIPseq.html
We need to count all reads at each base pair
Slide26ChIP-seq landscapes depend on the protein
Park P. J., Nature Genetics, 2009
Slide27We can compare different experimental datasets for the same genomic region
Gifford et.al., Cell 2013
5mC
Slide28We can compare different experimental conditions in a genome browser
UCSC Genome Browser (online)
IGV
(install on a local computer)
Jung et al.,
NAR
2014
Slide29Systematic analysis requires to identify all peaks in all datasets and compare differences
Badet
et al. (2012)
Nature Protocols
,
7
, 45-61
Slide30Peak calling is a method to identify areas in a genome
enriched with aligned reads
Wilbanks
EG (2010)
PLoS
ONE
5, e11471.
Slide31Peak calling: finding the peaks
Pepke
et al.
(2009).
Nature
Methods
,
6, S22–S32.
Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest
Slide32Peak calling: defining statistical significance
Pepke
et al.
(2009).
Nature
Methods
,
6, S22–S32.
Slide33Peak calling: defining statistical significance
MACS
(good for TFs)
CISER
(histones, etc)
HOMER
(universal)
PeakSeqedgeR CisGenome
Park P. J., Nature Genetics, 2009
Is this peak
statisticallysignificant?Is this peakstatisticallysignificant?
Slide34Important: peaks are just genomic regions
Slide35Genes are also some genomic regions
DESeq
,
edgeR
,
Cuffdiff
DNA methylation: also genomic regions
BISMARK
DMRcaller
Individual
CpGs
Differentially methylated regions
Slide37Any genomic regions can be intersected
BedTools
(command line)
Galaxy
(online)
Slide38We can calculate
distribution of TF binding sites among different genomic features
Toropainen
et al. (2016)
Scientific Reports
, 6, 33510
Slide39We can also calculate enrichments of binding sites of our TF in different genomic regions
Mattout
et al., Genome Biology, 2015
Slide40…Or study the DNA sequence inside the peaks to find some common motifs
HOMER, MEME
Massie
et al., EMBO J. (2011) 30, 2719–2733
Slide41Motif enrichment analysis
MEME-
ChIP
Slide42What else can we do with peaks?
Compare two experimental conditions to see which peaks appear/disappear (e.g. protein binding gained/lost);
Compute associations of our protein with different genes (e.g. define which genes are regulated by this protein)
Study the DNA sequence inside the peaks (e.g. to find which other TFs co-bind with our protein of interest)
Look how our peaks are arranged with respect to other peaks (e.g. to check for interactions with other proteins)
etc