/
NGS applications and data analysis NGS applications and data analysis

NGS applications and data analysis - PowerPoint Presentation

PlayfulSpirit
PlayfulSpirit . @PlayfulSpirit
Follow
342 views
Uploaded On 2022-08-03

NGS applications and data analysis - PPT Presentation

Vladimir Teif Intro to NGS analysis Proficio course 2020 NGS techniques vs NGS applications NGS techniques how to sequence DNA or RNA covered in lecture 1 funny recap in this video ID: 933478

dna seq methods ngs seq dna ngs methods protein sequencing genome peaks www chip https data analysis rna regions

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "NGS applications and data analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

NGS applications and data analysis

Vladimir Teif

Intro to NGS analysis

Proficio

course 2020

Slide2

NGS techniques vs. NGS applications

NGS techniques: how

to sequence DNA (or RNA)

(covered in lecture 1; funny recap in this video

https://www.youtube.com/watch?v=-

7GK1HXwCtE

)

NGS applications

:

how

to design experiments in order to answer

a specific biological question

Slide3

Examples of NGS applications

Figure adapted from

http://www.scienceinschool.org

Hi-C

Chromatin

domains

Slide4

Types of NGS applications

RNA-seq, GRO-seq, CAGE, SAGE, CLIP-seq, Drop-seq

gene expression; non-coding RNA

ChIP-seq, MNase-seq,

DNase

-seq, ATAC-se, etc

protein binding; histone modifications

chromatin accessibility; nucleosome positioning Bisulfite sequencing (DNA methylation)

Hi-C, 3C, 4C, ChIA-PET, etc (Chromatin loops) Amplicon sequencing targeted regions; philogenomics; metagenomics

Whole Genome Sequencing (WGS) de-novo assembly (new species or new analyses)

Curated bibliography of *seq methods (~100 methods)can be found at https://liorpachter.wordpress.com/seq/

Slide5

RNA-seq (RNA sequencing)

https://en.wikipedia.org/wiki/RNA-Seq

Slide6

ChIP-seq (

Ch

romatin

I

mmuno

p

recipitation followed by

seq

uencing)

1. Crosslink Protein-DNA complexes

in situ

2. Isolate nuclei and fragment DNA (sonication or digestion)

3.

Immunoprecipitate

with antibody

against target nuclear protein

and reverse crosslinks

4. Release DNA and submit for sequencing

Adapted from www.VisiScience.com

Slide7

MM

MNase

-seq (

M

icrococcal

N

uclease digestion followed by sequencing)Teif et al. (2012),

Methods, 62, 26-38MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes)

Slide8

FAIRE-seq (

F

ormaldehyde-

A

ssisted

I

solation of

Regulatory Elements)

sequencing

Giresi et al (2007),

Genome Res. 17, 877–885

Slide9

DNAse

-seq (

DNase

I

digestion

followed by

seq

uencingWang et al. (2012), PLoS

ONE 7, e42414

Slide10

How transposase works:

https://www.youtube.com/watch?v=XYZHMGUGq6o

ATAC-seq (

A

ssay

for

T

ransposase

-Accessible Chromatin using sequencing) Buenrostro et al. (2013) Nat Methods.

10, 1213-1218

Slide11

MM

Methods for 1D genome mapping

Meyer & Liu,

Nature Reviews Genetics

15

, 709–721 (

2014

)

Slide12

Methods for 1D genome mapping

Tsompana

and

Buck,

Epigenetics

& Chromatin

20147:33

Slide13

NGS methods for DNA methylation

Bisulfite

sequencing

Affinity purification (e.g.

MeDIP

)

Slide14

Chromatin Conformation Capture methods to map locations of DNA-DNA loops

Rao et al.,

Nature

159

,

1665–1680 (2014)

Slide15

River and Ren (2013),

Cell

,

155

, 39-55

Since 2017 DNA loops can be measured with 100-bp resolution (

Bonev

et al.,

Cell, 2017)

Slide16

Timeline of NGS methods

River and Ren (2013), Cell,

155

, 39-55

Hu et al,

Front

. Cell Dev. Biol

., 2018

Bulk methods that require many cells

Single-cell methods

Slide17

Where to get NGS data?

Do your own experiment

Gene Expression Omnibus (GEO)

https://www.ncbi.nlm.nih.gov/geo

Sequence read archive (SRA)

https://www.ncbi.nlm.nih.gov/sra

European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA)

https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/

You also have to upload your data!

Slide18

Next generation sequencing analysis

Slide19

How to analyze NGS data?

Ask a bioinformatician

you need to explain what do you want, and for that you need to understand what/how can be done

Do it yourself

Command line –> become a

bioinformatician

Online wrappers –> simpler, but file size limitsExample of a convenient online tool:

Galaxy http://galaxy.essex.ac.uk/

Slide20

ChIP-seq (

Ch

romatin

I

mmuno

P

recipitation

followed by

seq

uencing)

1. Crosslink Protein-DNA complexes

in situ

2. Isolate nuclei and fragment DNA (sonication or digestion)

3.

Immunoprecipitate

with antibody

against target nuclear protein

and reverse crosslinks

4. Release DNA and submit for sequencing

Adapted from www.VisiScience.com

Slide21

Experiment

Data

analysis

http://www4.utsouthwestern.edu/mcdermottlab/NGS/index.html

Slide22

ChIP-seq data analysis

www.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png

Slide23

Unmapped sequenced reads(this is “raw”, primary data):

Slide24

Mapped reads are characterised

by their locations in the genome

Bowtie,

BWA, ELAND,

Novoalign

,

BLAST, ClustalWTopHat (for RNA-seq)

Slide25

Reads can align to overlapping locations

http://biocluster.ucr.edu/~rkaundal/workshops/R_feb2016/ChIPseq/ChIPseq.html

We need to count all reads at each base pair

Slide26

ChIP-seq landscapes depend on the protein

Park P. J., Nature Genetics, 2009

Slide27

We can compare different experimental datasets for the same genomic region

Gifford et.al., Cell 2013

5mC

Slide28

We can compare different experimental conditions in a genome browser

UCSC Genome Browser (online)

IGV

(install on a local computer)

Jung et al.,

NAR

2014

Slide29

Systematic analysis requires to identify all peaks in all datasets and compare differences

Badet

et al. (2012)

Nature Protocols

,

7

, 45-61

Slide30

Peak calling is a method to identify areas in a genome

enriched with aligned reads

Wilbanks

EG (2010)

PLoS

ONE

5, e11471.

Slide31

Peak calling: finding the peaks

Pepke

et al.

(2009).

Nature

Methods

,

6, S22–S32. 

Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest

Slide32

Peak calling: defining statistical significance

Pepke

et al.

(2009).

Nature

Methods

,

6, S22–S32. 

Slide33

Peak calling: defining statistical significance

MACS

(good for TFs)

CISER

(histones, etc)

HOMER

(universal)

PeakSeqedgeR CisGenome

Park P. J., Nature Genetics, 2009

Is this peak

statisticallysignificant?Is this peakstatisticallysignificant?

Slide34

Important: peaks are just genomic regions

Slide35

Genes are also some genomic regions

DESeq

,

edgeR

,

Cuffdiff

Slide36

DNA methylation: also genomic regions

BISMARK

DMRcaller

Individual

CpGs

Differentially methylated regions

Slide37

Any genomic regions can be intersected

BedTools

(command line)

Galaxy

(online)

Slide38

We can calculate

distribution of TF binding sites among different genomic features

Toropainen

et al. (2016)

Scientific Reports

, 6, 33510

Slide39

We can also calculate enrichments of binding sites of our TF in different genomic regions

Mattout

et al., Genome Biology, 2015

Slide40

…Or study the DNA sequence inside the peaks to find some common motifs

HOMER, MEME

Massie

et al., EMBO J. (2011) 30, 2719–2733

Slide41

Motif enrichment analysis

MEME-

ChIP

Slide42

What else can we do with peaks?

Compare two experimental conditions to see which peaks appear/disappear (e.g. protein binding gained/lost);

Compute associations of our protein with different genes (e.g. define which genes are regulated by this protein)

Study the DNA sequence inside the peaks (e.g. to find which other TFs co-bind with our protein of interest)

Look how our peaks are arranged with respect to other peaks (e.g. to check for interactions with other proteins)

etc