Vladimir Teif BS312 Genome Bioinformatics Lecture 5 Next generation sequencing analysis httpsmicromagnetfsueducellsnucleusimageschromatinstructurefigure1jpg Chromatin basics reminder ID: 998497
Download Presentation The PPT/PDF document "Chromatin basics & ChIP-seq analysis" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Chromatin basics &ChIP-seq analysisVladimir TeifBS312 – Genome BioinformaticsLecture 5
2. Next generation sequencing analysis
3. https://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpgChromatin basics -- reminder
4. Transcription factor (TF) concentrationsProteins produced (including TFs)Teif et al. (2013), Methods. 62, 26-38Protein assembly at regulatory regionsTranscription start siteTranscription factor-centric view
5. Transcription factor (TF) concentrationsProteins produced (including TFs)Teif et al. (2013), Methods. 62, 26-38PromoterEnhancerRNA polymerase: enzyme which makes RNATranscription factor-centric view
6. Histone modifications-centric viewTurner B.M. (2005) Nature Structural & Molecular Biology, 12, 110 - 112
7. http://dev.biologists.org/content/139/6/1045Histone modifications-centric view
8. NGS METHODS AND THEIR APPLICATIONSFigure adapted fromhttp://www.scienceinschool.orgHi-CChromatin domains
9.
10. ChIP-seq (Chromatin Immunoprecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ2. Isolate nuclei and fragment DNA (sonication or digestion)3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks4. Release DNA and submit for sequencingAdapted from www.VisiScience.com
11. MMMNase-seq (Micrococcal Nuclease digestion followed by sequencing)Teif et al. (2012), Methods, 62, 26-38MNase = Micrococcal Nuclease (enzyme that cuts DNA between nucleosomes)
12. FAIRE-seq (Formaldehyde-Assisted Isolation of Regulatory Elements)sequencingGiresi et al (2007), Genome Res. 17, 877–885
13. DNAse-seq (DNase I digestion followed by sequencingWang et al. (2012), PLoS ONE 7, e42414
14. How transposase works: https://www.youtube.com/watch?v=XYZHMGUGq6o ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) Buenrostro et al. (2013) Nat Methods. 10, 1213-1218
15. MMMethods for 1D genome mappingMeyer & Liu, Nature Reviews Genetics 15, 709–721 (2014)
16. Methods for 1D genome mappingTsompana and Buck, Epigenetics & Chromatin20147:33
17. Timeline of NGS methodsRiver and Ren (2013), Cell, 155, 39-55Hu et al, Front. Cell Dev. Biol., 2018Bulk methods that require many cellsSingle-cell methods
18. Where to get NGS data? Do your own experiment Gene Expression Omnibus (GEO) https://www.ncbi.nlm.nih.gov/geo Sequence read archive (SRA) https://www.ncbi.nlm.nih.gov/sra European Nucleotide Archive https://www.ebi.ac.uk/ena The Cancer Genome Atlas (TCGA) https://tcga-data.nci.nih.gov/tcga Exome Aggregation Consortium (ExAC) http://exac.broadinstitute.org/ You also have to upload your data!
19. How to analyze NGS data? Ask a bioinformatician you need to explain what do you want, and for that you need to understand what/how can be done Do it yourself Command line –> become a bioinformatician Online wrappers –> simpler, but file size limitsExample of a convenient online tool:Galaxy http://galaxy.essex.ac.uk/
20. ChIP-seq (Chromatin ImmunoPrecipitation followed by sequencing) 1. Crosslink Protein-DNA complexes in situ2. Isolate nuclei and fragment DNA (sonication or digestion)3. Immunoprecipitate with antibody against target nuclear protein and reverse crosslinks4. Release DNA and submit for sequencingAdapted from www.VisiScience.com
21. ExperimentDataanalysishttp://www4.utsouthwestern.edu/mcdermottlab/NGS/index.html
22. ChIP-seq analysis workflowwww.utsouthwestern.edu/labs.bioinformatics-core/analysis/chip-seq.png
23. NGS data after sequencing but before mapping (.fastq file aka “raw” data):
24. Mapping with Bowtiehttp://bowtie-bio.sourceforge.net/manual.shtml-v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel-m <N> disregard reads with >N possible alignments
25. Guess what this command does-v <N> Allow no more than N mismatches, where V may be a number from 0 through 3 set using the -v option. -p <N> Use N computer processors/cores in parallel-m <N> disregard reads with >N possible alignmentsbowtie -v 2 -p 2 -m 1 mm9 filename.fastq filename.map
26. NGS data after mapping:.bed files (BED format)Bowtie, BWA, ELAND, Novoalign, BLAST, ClustalWTopHat (for RNA-seq)
27. Reads can align to overlapping locationshttp://biocluster.ucr.edu/~rkaundal/workshops/R_feb2016/ChIPseq/ChIPseq.htmlWe need to count all reads at each base pair
28. From mapped reads to occupancy landscapesHOMER, BedTools, BamTools, NucToolsTeif et al., Methods, 2012
29. http://homer.ucsd.edu/homer/ngs/tagDir.htmlmakeTagDirectory <Directory Name> [options] <alignment file>Calculating occupancy with HOMER
30. http://homer.ucsd.edu/homer/ngs/tagDir.htmlQuality control (QC)
31. http://homer.ucsd.edu/homer/ngs/tagDir.htmlQuality control (QC)Good ChIP-seqBad ChIP-seq
32. Data view in genome browsersUCSC Genome Browser (online) IGV (install on a local computer)Jung et al., NAR 2014
33. https://genome.ucsc.edu/UCSC Genome Browser
34. http://homer.ucsd.edu/homer/ngs/ucsc.htmlmakeUCSCfile <tag directory> -o autoCreate UCSC files with HOMER
35.
36. Peak shapes can be differentPark P. J., Nature Genetics, 2009
37. Systematic analysis requires to identify all peaks in all datasets and compare differencesBadet et al. (2012) Nature Protocols, 7, 45-61
38. Peak calling is a method to identify areas in a genome enriched with aligned readsWilbanks EG (2010) PLoS ONE 5, e11471.
39. Peak calling: finding the peaksPepke et al. (2009). Nature Methods, 6, S22–S32. Input: sample that was prepared in the same way as in the ChIP-seq, but no antibody was added, so it has no specific enrichment of our protein of interest
40. Peak calling: defining statistical significance
41. Peak calling: defining statistical significanceMACS (good for TFs)CISER (histones, etc)HOMER (universal)PeakSeqedgeR CisGenomePark P. J., Nature Genetics, 2009Is this peakstatisticallysignificant?Is this peakstatisticallysignificant?
42. Finding peaks with HOMERhttp://homer.ucsd.edu/homer/ngs/peaks.html
43. Guess what this command doesfindPeaks ChIPDirectory -style factor -i InputDirectoryWe need to map our ChIP-seq and its Input (control), then create their HOMER tag directories ChIPDirectory and InputDirectory, then find peaks using both these directories.Additional optional parameters:-F <#> Enrichment ratio ChIP vs. Input (by default 4-fold)-P <#> P-value cut off (by default 0.0001
44. ChIP-seq: reads to peaks/regionsMACS, CISER, HOMERPeakSeq, edgeR, DESeq, CisGenome
45. Peaks/regions in BED formatpos2bed.pl peakfile.txt > peakfile.bedbed2pos.pl peakfile.bed > peakfile.txt
46. Intersecting genomic regionsBedTools (command line)Galaxy (online)
47. Genomic features are also regionsMattout et al., Genome Biology, 2015
48. Let’s look at many similar regionsdeepToolsNucToolshttps://github.com/fidelram/deepTools/wiki/VisualizationsEach horisontal line is one genomic region
49. ChIP-seq heat maps for all genes, scaled with respect to their start (TSS) and end (TES)https://github.com/fidelram/deepTools/wiki/Visualizations
50. Cluster heatmapshttps://github.com/fidelram/deepTools/wiki/VisualizationsdeepTools 2.0
51. Comparing cluster heatmapsbetween two cell conditionsNucTools
52. Histone modifications around TSShttp://www.ie-freiburg.mpg.de/bioinformaticsfacdeepTools
53. Motif enrichment analysisHOMER, MEMEPavlaki et al., 2017
54. Finding motifs with HOMERHOMER takes the coordinates of all ChIP-seq peaks, looks at the corresponding DNA sequences of each peak and finds the common consensus motifs that are encountered in many of these peaks.Then HOMER looks in a database and reports which motifs are similar to already known TF binding motifs, and which motifs are new.
55. The MEME Suite is even more sophisticated and contains all tools that are needed for motif analysishttp://meme-suite.org
56. Summary of ChIP-seq analysis:Map all readsOccupancy calculationDifferential peak callingIntersection of different signalsCorrelation of different signalsMotif enrichment in peaks
57. Raw reads -> mapping -> peak calling~100s types of NGS experiments; we focus on chromatinChIp-seq data structureRAW DATA; MAPPED READS; REGIONS; SITESGENOME BROWSERS. PEAKS. PEAK CALLINGMUST KNOW:HEATMAP; AGGREGATE PROFILE; GENE ONTOLOGY (GO)Optional video: https://www.youtube.com/watch?v=Ob9xGBPvr_sTake home messageWhere NGS data is stored (GEO, etc)