Alexandre Gillet Markowska Alexandregilletmarkowskaupmcfr Gilles Fischer Team Biology of Genomes UMR7238 Laboratory of Computational and Quantitative Biology Université Pierre et MarieCurie Paris ID: 1042355
Download Presentation The PPT/PDF document "Discovery of Structural Variation with N..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
1. Discovery of Structural Variation with Next-Generation SequencingAlexandre Gillet-MarkowskaAlexandre.gillet-markowska@upmc.frGilles Fischer Team – Biology of Genomes UMR7238Laboratory of Computational and Quantitative BiologyUniversité Pierre et Marie-Curie, Paris
2. Structural variations (SV)(ii) SV detection technologies (iii) Read pairs: 2 types of Illumina genomic DNA libraries(iv) SV detection using Read pairs(v) Polymorphic SV Structural Variations (SV)outline
3. 1Yes, the minimal size is arbitrary…1Structural Variations (SV)
4. Structural Variations (SV)
5. Structural Variations (SV)
6. Structural Variations (SV)
7. Structural Variations (SV)
8. INVERSION (INV)RECIPROCAL TRANSLOCATION (RT)INSERTION (INS)DELETION (DEL)refSVrefSVBalanced SVUnbalanced SV (CNV)Intrachromosomal SVInterchromosomal SVrefSVrefSVTANDEM DUPLICATION (DUP)Balanced SV versus Unbalanced SVPictures adapted from Feuk et al., 2006 Nature ReviewsCalvin Blackman Bridges, Science
9. Why Discover SV ?involved in > 30 diseases (Psoriasis, Crohn disease, ASD…)chromosomal instability detected in the vast majority of cancerspowerful mechanism of adaptation and evolution
10. SV detection technologies
11. Calvin Blackman Bridges, ScienceTimeline of technologies used to discover SVSV, Structural Variations since 19361936Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci1959Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet1986Comparative cytogenetics
12. Calvin Blackman Bridges, Science200 et 221 CNV360 Mb CNVR (12% du génome humain)1936Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci1959Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet1986Iafrate, Detection of large-scale variation in the human genome, NatureSebat, Large-scale copy number polymorphism in the human genome, Science2004Redon, Global variation in copy number in the human genome, Nature2006Comparative cytogeneticsMicroarraysTimeline of technologies used to discover SVSV, Structural Variations since 1936
13. Calvin Blackman Bridges, Science200 et 221 CNV360 Mb CNVR (12% du génome humain)MicroarraysKorbel et al, Paired-end mapping reveals extensive structural variation in the human genome, ScienceNGS1936Lejeune, Study of somatic chromosomes from 9 mongoloid children, Hebd Seances Acad Sci1959Smith et al, Interstitial deletion of (17)(p11.2p11.2) in nine patients. Am J Med Genet1986Iafrate, Detection of large-scale variation in the human genome, NatureSebat, Large-scale copy number polymorphism in the human genome, Science2004Redon, Global variation in copy number in the human genome, Nature200620071000 HGP, A map of human genome variation from population-scale sequencing, Nature201020 000 SV1 000 SVComparative cytogeneticsTimeline of technologies used to discover SVSV, Structural Variations since 1936
14. ‘Range of usability’ of technologiesSize limitSV type limit
15.
16. SV detection with NGS data
17. Breakpoints res.SV size rangeCNVBalanced SVFDRMissing rate>100 bp> Insert SizeYesYesVariableVariableQuinlan & Hall 2011 Trends in GeneticsLI 2011 Nature1 bp1 bp–50 kbpYesYes>10%>25%1-10 bp>10 bpYesNoHigh?High?1 bp>1 bpYesYeslowHigh?How to detect SV with NGS data ?
18. Read pairs: 2 types of Illumina genomic DNA libraries1) Illumina Paired-End2) Illumina Mate-Pair
19. 1) Illumina Paired-End
20. 2) Illumina Mate-Pair
21. Illumina Paired end vs Mate-Pair (MP allows a better genome assembly than PE)MP allows to detect SV that involve repeated elements
22. Illumina Paired end vs Mate-PairInsert-size distribution of 100,000 read-pairsInsert-size (bp)5,000(or much less…)
23. Illumina Paired end vs Mate-Pair
24. SV detection with Read pairstrim the dataalign data to reference genomeremove PCR duplicatesSV calling
25. Trim the dataFirst criteria: Chargaff rule
26. Trim the dataFirst criteria : %A = %T and %G = %C on both DNA strands
27. Trim the dataSecond criteria: nucleotide qualityBcbio-nextgenBtrimCANGSChipsterClean readsConDeTriEa-utilsFastxFlexbarPRINSEQReaperSeqTrimSkewerSolexaQATagCleanerTrimmomaticTrimming tools
28. Align the data to reference genome
29. Remove PCR duplicatessamtools rmdup (only intra-molecular duplicates)markduplicates.jar (picard tools)FastUniq…PCR duplicates annotation tools
30. SV signaturesSV have nearly identical signatures with MP and PE
31. SV signaturesGillet-Markowska, 2014, Bioinformatics
32. SV signatures
33. SV signatures
34. Inter-tool variability is immense
35. Inter-tool variability is immense
36. Inter-tool variability is immense Adapted from ICGC-TCGA challenge
37. Inter-tool variability is immense
38. SV examples
39. Korbel et al, Science 2007SV in the Human genome
40. Not-so-identical monozygotic twinsBruder, C. E. G. et al. Phenotypically concordant and discordant monozygotic twins display different DNA copy-number-variation profiles. Am. J. Hum. Genet. 82, 763–771 (2008)
41. Butterfly mimicry
42. Butterfly mimicry
43. Livestock phenotypes caused by CNV
44. Polymorphic SV Structural Variations (SV)
45. Individual (germ line)SV in 100% of cells of each individualTissue (somatic)SV in one tissue / in a few cellsPolymorphic SV Structural Variations (SV)
46. #generationBottleneck 16090 1201502400Bottleneck 2Bottleneck 3Bottleneck 4Bottleneck 5Bottleneck 80030#cells124109Sequencing a single cultureCan we detect de novo SV occurring in a single cell culture by high throughput sequencing ?DNA extractionSequencing(n=80)DNA extractionSequencingThe physical coverage (theoretically) sets the detection thresholdS. cerevisiae30 # generations011109# cells11224122.103103138.103141.6.1046,000X700X
47. Pair-End sequencing: insert size ~ 400 bpSequencing with high physical coverageReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10
48. Pair-End sequencing: insert size ~ 400 bpSequencing with high physical coverageReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10
49. Pair-End sequencing: insert size ~ 400 bpSequencing with high physical coverage210Coverage (sequence)covseq = 0.5XReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10
50. Pair-End sequencing: insert size ~ 400 bpSequencing with high physical coverage210210Coverage (sequence)covseq = 0.5Xcovphys = 0.85XCoverage (physical)ReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10
51. Pair-End sequencing: insert size ~ 400 bpSequencing with high physical coverage210210Coverage (sequence)covseq = 0.5XcovSV = 0covSV = 0ReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10covphys = 0.85XCoverage (physical)
52. Mate Pair sequencing: insert size ~ 1 to 20 kbSequencing with high physical coverageReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 10Discordant Paired Sequence
53. Mate Pair sequencing: insert size ~ 1 to 20 kbSequencing with high physical coverageReferenceCell 1Cell 2Cell 3Cell 4Cell 5Cell 6Cell 7Cell 8Cell 9Cell 102102046810covseq = 0.5Xcovphys = 5XCoverage (sequence)Coverage (physical)covSV = 1Discordant Paired SequenceMate Pair sequencing increases the sensitivity of SV detection
54.
55.
56. Illumina Paired-End
57. Illumina Paired-End