What is Cancer Definitions A class of diseases characterized by malignant growth of a group of cells Growth is uncontrolled Invasive and Damaging Often able to metastasize An instance of such a disease a malignant tumor ID: 539999
Download Presentation The PPT/PDF document "Cancer Sequencing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Cancer SequencingSlide2
What is Cancer?Definitions
A class of diseases characterized by malignant growth of a group of cells
Growth is uncontrolled
Invasive and Damaging
Often able to metastasizeAn instance of such a disease (a malignant tumor)A disease of the genome
http://en.wikipedia.org/wiki/Cancer
http://faculty.ksu.edu.sa/tatiah/Pictures%20Library/normal%20male%20karyotyping.jpg Slide3
What is Cancer?Definitions
A class of diseases characterized by malignant growth of a group of cells
Growth is uncontrolled
Invasive and Damaging
Often able to metastasizeAn instance of such a disease (a malignant tumor)A disease of the genome
http://en.wikipedia.org/wiki/Cancer
http://www.moffitt.org/CCJRoot/v2n5/artcl2img4.gifSlide4
Fundamental Changes in Cancer Cell Physiology
Evasion of anti-cancer control mechanisms
Apoptosis (e.g. p53)
Antigrowth signals (e.g.
pRb)Cell Senescence
Hanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.
Exploitation of natural pathways for cellular growth
Growth Signals (e.g. TGF family)
Angiogenesis
Tissue Invasion & Metastasis
Acceleration of Cellular Evolution Via Genome Instability
DNA Repair
DNA PolymeraseSlide5
Many Paths Lead to Cancer Self-Sufficiency
Hanahan
, Douglas, and Ra Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70. Slide6
Cancer Heterogeneity
Greaves
, M. &
Maley
, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).Slide7
Why Sequence Cancer Genomes?
Better understand cancer biology
Pathway information
Types of mutations found in
different cancersSlide8
Why Sequence Cancer Genomes?Better understand cancer biology
Pathway information
Types of mutations found in
different cancers
Cancer DiagnosisGenetic signatures of cancer types will inform diagnosisNon-invasive means of detecting or confirming presence of cancerImprove cancer therapies
Targeted treatment of cancer subtypes
COSMIC Database, v48, July 2010
http://
www.sanger.ac.uk
/genetics/CGP/cosmic/
Forbes et al. 2011. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39: D945-D950
Samples
544809
Mutations
141212
Papers
10383
Whole Genomes
29Slide9
Why Sequence Cancer Genomes?Better understand cancer biology
Pathway information
Types of mutations found in
different cancers
Cancer DiagnosisGenetic signatures of cancer types will inform diagnosisNon-invasive means of detecting or confirming presence of cancerImprove cancer therapies
Targeted treatment of cancer subtypes
COSMIC Database,
v71, Oct 2014
http://
www.sanger.ac.uk
/genetics/CGP/cosmic/
Samples
1058292
Mutations
2710449
Papers
20247
Whole Genomes
15047Slide10
How Do We Sequence Cancer
G
enomes?Slide11
How Do
W
e
S
equence Cancer Genomes?Slide12
Read MappingSlide13
Definition of Coverage
Length of genomic segment:
L
Number of reads:
nLength of each read:
lDefinition:
Coverage C = n l / L
How much coverage is enough?
Lander-Waterman model:
Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides
CSlide14
Read Mapping
BWASlide15
Paired-End Read Mapping
Reference
Physical Coverage: 4
Sequence Coverage: 2
Physical coverage refers to the genomic coverage including the
unsequenced
regions of each DNA fragment
Sequence coverage refers to the genomic coverage counting only the sequenced part of each DNA fragment
Increased gap length between paired reads provides higher physical coverage without incurring increased costs for sequencing, which is useful for detecting certain types of mutationsSlide16
Factors that effect mutation signalLimited genetic material (lower depth)Mixture of tumor and normal tissueCancer Heterogeneity
Factors that introduce noise
Formalin-fixed and Paraffin-embedded samples
Increased number of mutations and unusual genomic rearrangements
General ConsiderationEach individual has many unique mutations that could be confused with cancer causing mutations
Considerations for Cancer SequencingSlide17
Human Genome VariationSNP
TGC
T
GAGA
TGCCGAGA
Novel SequenceTGC
TCGGAGATGC - - - GAGA
Inversion
Mobile Element or
Pseudogene
Insertion
Translocation
Tandem Duplication
Microdeletion
TGC
- -
AGA
TGCCGAGA
Transposition
Large Deletion
Novel Sequence
at Breakpoint
TGCSlide18
Variant Types
Variant Types
Single
Nucleotide
Variants(SNVs
)Small Insertion
/ Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel SequenceSlide19
SNV Calling
Variant Types
Single
Nucleotide
Variants(SNVs
)Small Insertion
/ Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
A
b
ayesian
a
pproach is the most general and common method of calling SNVs
MAQ,
SOAPsnp
, Genome
Analyis
ToolKit
(GATK),
SAMtoolsSlide20
SNV Calling
Variant Types
Single
Nucleotide
Variants(SNVs
)Small Insertion
/ Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
http://
www.broadinstitute.org
/
gatk
//events/2038/GATKwh0-BP-5-Variant_calling.pdfSlide21
SNV Calling
Variant Types
Single
Nucleotide
Variants(SNVs
)Small Insertion
/ Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
A given human genome (
germline
) differs from the reference genome at millions of positions.
A cancer genome differs from the healthy genome of its host by tens of thousands of positions at most, which is several orders of magnitude fewer differences than
germline
versus reference
How do we distinguish
germline
mutations from somatic mutations?Slide22
Somatic S
NV calling
Tumor Tissue
Normal Tissue
Compare the
alignment results
Most naïve: use a standard SNV caller on both datasets. If there is a mutation found in the tumor sample but not the normal, it is somatic!Slide23
Somatic SNV calling
Roth
, A. et al.
JointSNVMix
: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–13 (2012).
JointSNVMix
probabilistic graphical models for joint tumor-normal SNV callingSlide24
Short Indel Calling
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion / Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
Reference
Deletion
InsertionSlide25
Short Indel Calling
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion / Deletion (indels)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
Reference
Deletion
Insertion
Reference
Read mapping
i
n practice
Unmappable
part of read (just the read end)
Unmapped read (could not be aligned
anywhere)Slide26
Short Indel Calling – Discordant Reads Pairs
II) Deletion
I
) Insertion
i
d
l
l
- i
l
+
d
l
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short
Insertion
/ Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
Reference
ReferenceSlide27
Short Indel Calling – Split Read Mapping
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion / Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
Reference
Reference
Deletion
Read mapping
i
n practiceSlide28
Short Indel Calling – Split Read Mapping
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion / Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel Sequence
Reference
Reference
Deletion
Read mapping
i
n practice
Remap each end of the
suspicious reads Slide29
Paired-end mapping can improve power to detect variants without need for more sequencing
Modified from
Meyerson
et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696Slide30
Copy Number Variants
Ref:
A B C D E F
G H I K
A B C D C E F
G H C I K
A B C D
C
E F
G H
C
I K
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion
/ Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel SequenceSlide31
Copy Number Variants
Ref:
A B C D E F
G H I K
A B C D
C
E F
G H
C
I K
C
C
C
C
Depth of Coverage
Modified from
Dalca
and
Brudno
. 2010. Genome variation discovery with high-throughput sequencing data. Briefings in bioinformatics 11, no. 1: 3-14
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion
/ Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel SequenceSlide32
Problems with DOC Very sensitive to stochastic variance in coverageSensitive to bias coverage (e.g. GC content).
Impossible to determine non-reference locations of
CNVs
Graph methods using paired-end reads help overcome some of these problems
Copy Number Variants
Ref:
A B C D E F
G H I K
A B C D
C
E F
G H
C
I K
C
C
C
C
Depth of Coverage
Variant Types
Single
Nucleotide
Variants(SNVs
)
Small Insertion
/ Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel SequenceSlide33
Copy Number Variants - CNAnorm
Gusnanto
, A., Wood, H. M.,
Pawitan
, Y., Rabbitts, P. & Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 28, 40–7 (2012).
Overall steps in
CNAnorm method, a tool for detecting copy number changes in tumor samples
Data
:
s
equence data from tumor and normal samples
Steps
:
Count number of reads in fixed windows across the genome
Calculate ratio of reads in tumor vs. reads in normal for each window, correcting for sequence biases (e.g. GC)
Smooth ratio signal across windows
Normalize data
Estimate amount of normal contamination in tumor sample
Perform segmentation on tumor dataSlide34
Variant Types
Ref:
A B C D E F
G H I K
1 2 3
4 5 6 7 8
4
G I K
1 2 3
1 2
4
3
5 6 7 8
Structural Rearrangement
Translocation
3 2 1
5 6 7 8
Inversion
1 3
5
9
6 7 8
Large Insertion / Deletion
^
2
Variant Types
Single
Nucleotide
Variants(SNVs
)
Short Insertion
/ Deletion (
indels
)
Copy Number Variants (
CNVs
)
Structural Variants (
SVs
)
Novel SequenceSlide35
Summary of Variant Types
Meyerson
et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696Slide36
Passenger Mutations and Driver Mutations
Normal
Cancer
X
X
Driver or
Passenger?
Greaves
, M. &
Maley
, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).
SequenceSlide37
Passenger Mutations and Driver Mutations
Stratton, Michael R, Peter J Campbell, and P Andrew
Futreal
. 2009. The cancer genome. Nature 458, no. 7239 (April): 719-24. doi:10.1038/nature07943Slide38
Passenger Mutations and Driver MutationsDistinguishing Features
Presence in many tumors
Predicted to have functional impact on the cell
Conserved
Not seen in healthy adults (rare)Predicted to affect protein structureIn pathways known to be involved in cancer
Train Classifier using Machine Learning Approaches
Carter et al. 2009. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer research, no. 16: 6660-6667Slide39
Tracking the Evolution of CancerSlide40
Models of Breast Cancer Progression Slide41
Models of Breast Cancer Progression Slide42
What we did
Cancer phylogenetics
Lineage relationship of neoplastic lesions with cancers using somatic SNVs as lineage markers
Order of genomic events and drivers
Slide Courtesy of
Arend SidowSlide43
Samples
P1
P2
P3
P4
P5
P6
Lymph
Normal
CCL
FEA
DCIS
IDC
Side 1
Side 2
All samples are FFPE material
Slide Courtesy of
Arend
SidowSlide44
SamplesSlide45
Patient 1 Evolution – SNVs
Code:
0101
GGATAG
ATAG
C
G
G
C
GTCC
TAG
C
GT
CCATGG
CATGGC
CATGGC
GGCAAA
Normal sample
Early
n
eoplasia
(EN)
s
ample
EN with
atypia
(ENA) sample
Invasive ductal carcinoma
(IDC) sample
GGATAGTGTCCATGGCAAA
Reads from
s
equencing patient
s
ample
Human genome referenceSlide46
Patient 1 Evolution – SNVs
Code:
0101
Normal sample
Early
n
eoplasia
(EN)
s
ample
EN with
atypia
(ENA) sample
Invasive ductal carcinoma
(IDC) sample
Reads from
s
equencing patient
s
ample
Human genome reference
C
C
C
Multisample
SNV Code
1 0 0 0
Normal
EN
ENA
IDCSlide47
Patient 1 Evolution – SNVs
Code:
0101
Normal sample
Early
n
eoplasia
(EN)
s
ample
EN with
atypia
(ENA) sample
Invasive ductal carcinoma
(IDC) sample
Reads from
s
equencing patient
s
ample
Human genome reference
Multisample
SNV Code
0
1
1
0
Normal
EN
ENA
IDC
C
C
C
C
C
CSlide48
Patient 1 Evolution – SNVs
Code
Normal
EN
ENA
IDC
SUM
1000
89
89
0100
147
147
0010
102
102
0001
46
46
0011
755
755
755Slide49
Patient 1 Evolution – SNVs
Code
Normal
EN
ENA
IDC
SUM
1000
89
89
0100
147
147
0010
102
102
0001
46
46
0011
755
755
755
755
46
102
89
147
Normal
EN
ENA
IDC
Venn diagram
v
iewSlide50
Patient 1 Evolution – SNVs
Code
Normal
EN
ENA
IDC
SUM
1000
89
89
0100
147
147
0010
102
102
0001
46
46
0011
755
755
755
755
46
102
89
147
Normal
EN
ENA
IDC
P1Slide51
Patient 6 Evolution - SNVs
Lymph
CCL_CL
CCL
FEA
DCIS
IDC
SUM
010000
219
219
001000
305
305
000100
345
345
000010
978
978
000001
608
608
000101
61
61
61
000110
185
185
185
000111
510
510
510
510
01XXXX
0
010101
0
000011
0
3211Slide52
Lineage conceptsSomatic changesSlide53
Cell Divisions in One Generation
~40 cell
divisions
~60 new
point
mutationsSlide54
“Germline”
D?
You
Your dadSlide55
“Germline”
You
Your dad
Mutations detected here ...
... but not here ...
D?Slide56
Somatic (Tumor) Lineages
Sampled lesionSlide57
Somatic (Tumor) Lineages
Sampled lesionSlide58
Somatic (Tumor) Lineages
Sampled lesion 1
Sampled lesion 2Slide59
Patient 6 Evolution - SNVsSlide60
Patient 6 Evolution
Slide Courtesy of
Arend
SidowSlide61
Patient 6 Evolution – Copy Number Changes
Slide Courtesy of
Arend
SidowSlide62
AneuploidiesPutting the germline SNPs to good use (no somatic SNVs for this!)Slide63
Heterozygous Positions
G
A
T
m
A
T
C
p
50%
50%
50%Slide64
LOH (e.g., paternal chromosome)
G
A
T
m
A
T
C
p
0%
0%
0%
Fraction
of “
lesser allele
”Slide65
Chromosome Duplication
G
A
T
m
A
T
C
p
A
T
C
66%
66%
66%Slide66
Chromosome Duplication
G
A
T
m
A
T
C
p
A
T
C
33%
33%
33%
Fraction
of “
lesser allele
”Slide67
Lesser Allele Fraction Plots
Chromosome
Lesser allele fraction
Running number of germline
het SNP (N ~ 1.7 million)
Plots are windows of 1000 SNPs, overlapping by 500Slide68
Lesser Allele Fraction PlotsSlide69
Zoom-In
But ... What is the actual ploidy?Slide70
Absolute Coverage Pattern of LOH and Gain
Normal
LOH
GainSlide71
Absolute Coverage in LOH vs Ploidy Gain
Prevalent allele absolute coverage
Lesser allele absolute coverage
Lesser allele FRACTION
LOH
Gain
?Slide72
Fractions with normal contribution
G
A
Prevalent allele absolute coverage = 7
Lesser allele absolute coverage = 0
G
A
Prevalent allele absolute coverage = 14
Lesser allele absolute coverage = 7
Our samples: up to 50% normal (non-tumor) tissue content
Lesser allele fraction = 7/21 = 0.33Slide73
Patient 6 Evolution – Copy Number Changes
Slide Courtesy of
Arend
SidowSlide74
Patient 6 Evolution
Slide Courtesy of
Arend
SidowSlide75
Patient 2
Lymph
Normal
CCL
FEA
IDC
DCISSlide76
Patient 2 - normalSlide77
Patient 2 – CCL and DCIS
16q: 1N (LOH)
1q: 4N (3:1)
X: 1N (LOH)
16p: 3N (2:1)Slide78
Patient 2 – IDC has same as CCL,DCISSlide79
Patient 2 Aneuploidy Evolution
CCL,DCIS
IDC
1q,16p
16q, XSlide80
Patient 2 - IDCSlide81
Patient 2 Aneuploidy Evolution
CCL,DCIS
IDC
IDC’
1q,16p
16q, X
Major Crisis
Involving all but 6 chromosomes, including 10 whole-chromosome LOHs
No aneuploidies but ...Slide82
Patient 2 – SNVs
515
5
70
0
681
80
133
Allele
Freq
in CCL
CCL:
894
IDC:
1276
DCIS:
884Slide83
Patient 2 Evolution
1q,16p
16q, X
1p
2
4
58910
11
13
14
15
17
19
21
681
133
80
515Slide84
Patient Cancer Phylogeny Trees
Slide Courtesy of
Arend
SidowSlide85
Mutational ProfilesSlide86
Branched tree model
Victoria
Popic
Automated Inference of
Multi-Sample Cancer Phylogenies
Raheleh
Salari
Sample 1
Sample 3
SMutH
: Somatic Mutation Hierarchies
Sample 2Slide87
VAF profiles of SNVs across samplesSlide88
VAF profiles of SNVs – Clustering Slide89
Edge u
v :
Cell-Lineage VAF Constraint
u
v
“Possibly mutations in u
h
appened before those in v”Slide90
For each node u
and its children C :
Tree Construction
Find all spanning trees that satisfy VAF constraints
(extension of
Gabow&Myers
spanning tree search algorithm)
Rank trees according to their agreement with VAFs
u
v
wSlide91
Simulation Results
Pred
: pairs of nodes ordered correctly
Branch: pairs of nodes correctly assigned to separate branches
Shared edges: edges shared between true and reconstructed trees
u
v
w
u
v
w
z
y
xSlide92
ccRCC Study of Renal Carcinoma by Gerlinger et.
a
l
(2014)
HGSC Study of Ovarian Cancer
Bashashati
et. al
(2013)
Reconstruction of Lineage Trees in Recent LiteratureSlide93
PIK3CA H1047R
PIK3CA
H1047L
Expanded Breast Cancer Lineage TreesSlide94
Fantastic Cancer ReviewsHanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.
Hanahan
and Weinberg. 2011.
Hallmarks of cancer: the next generation. Cell 144, 646–
74.Reviews of Cancer GenomicsMeyerson, Matthew, Stacey Gabriel, and Gad Getz. 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696. doi:10.1038/nrg2841. http://www.nature.com/doifinder/10.1038/nrg2841.Yates
, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).Variant CallingDalca, Adrian V, and Michael Brudno
. 2010. Genome variation discovery with high-throughput sequencing data. Briefings in bioinformatics 11, no. 1 (January): http://www.ncbi.nlm.nih.gov/pubmed/20053733.Medvedev, Paul, Monica
Stanciu
, and Michael
Brudno
. 2009. Computational methods for discovering structural variation with next-generation sequencing. nature methods 6, no. 11 http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1374.html.
Further Readings for the Curious