/
Cancer Sequencing Cancer Sequencing

Cancer Sequencing - PowerPoint Presentation

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
449 views
Uploaded On 2017-04-21

Cancer Sequencing - PPT Presentation

What is Cancer Definitions A class of diseases characterized by malignant growth of a group of cells Growth is uncontrolled Invasive and Damaging Often able to metastasize An instance of such a disease a malignant tumor ID: 539999

variants cancer patient snvs cancer variants snvs patient sequence mutations number normal coverage copy evolution deletion sample insertion variant

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Cancer Sequencing" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Cancer SequencingSlide2

What is Cancer?Definitions

A class of diseases characterized by malignant growth of a group of cells

Growth is uncontrolled

Invasive and Damaging

Often able to metastasizeAn instance of such a disease (a malignant tumor)A disease of the genome

http://en.wikipedia.org/wiki/Cancer

http://faculty.ksu.edu.sa/tatiah/Pictures%20Library/normal%20male%20karyotyping.jpg Slide3

What is Cancer?Definitions

A class of diseases characterized by malignant growth of a group of cells

Growth is uncontrolled

Invasive and Damaging

Often able to metastasizeAn instance of such a disease (a malignant tumor)A disease of the genome

http://en.wikipedia.org/wiki/Cancer

http://www.moffitt.org/CCJRoot/v2n5/artcl2img4.gifSlide4

Fundamental Changes in Cancer Cell Physiology

Evasion of anti-cancer control mechanisms

Apoptosis (e.g. p53)

Antigrowth signals (e.g.

pRb)Cell Senescence

Hanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.

Exploitation of natural pathways for cellular growth

Growth Signals (e.g. TGF family)

Angiogenesis

Tissue Invasion & Metastasis

Acceleration of Cellular Evolution Via Genome Instability

DNA Repair

DNA PolymeraseSlide5

Many Paths Lead to Cancer Self-Sufficiency

Hanahan

, Douglas, and Ra Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70. Slide6

Cancer Heterogeneity

Greaves

, M. &

Maley

, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).Slide7

Why Sequence Cancer Genomes?

Better understand cancer biology

Pathway information

Types of mutations found in

different cancersSlide8

Why Sequence Cancer Genomes?Better understand cancer biology

Pathway information

Types of mutations found in

different cancers

Cancer DiagnosisGenetic signatures of cancer types will inform diagnosisNon-invasive means of detecting or confirming presence of cancerImprove cancer therapies

Targeted treatment of cancer subtypes

COSMIC Database, v48, July 2010

http://

www.sanger.ac.uk

/genetics/CGP/cosmic/

Forbes et al. 2011. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Research 39: D945-D950

Samples

544809

Mutations

141212

Papers

10383

Whole Genomes

29Slide9

Why Sequence Cancer Genomes?Better understand cancer biology

Pathway information

Types of mutations found in

different cancers

Cancer DiagnosisGenetic signatures of cancer types will inform diagnosisNon-invasive means of detecting or confirming presence of cancerImprove cancer therapies

Targeted treatment of cancer subtypes

COSMIC Database,

v71, Oct 2014

http://

www.sanger.ac.uk

/genetics/CGP/cosmic/

Samples

1058292

Mutations

2710449

Papers

20247

Whole Genomes

15047Slide10

How Do We Sequence Cancer

G

enomes?Slide11

How Do

W

e

S

equence Cancer Genomes?Slide12

Read MappingSlide13

Definition of Coverage

Length of genomic segment:

L

Number of reads:

nLength of each read:

lDefinition:

Coverage C = n l / L

How much coverage is enough?

Lander-Waterman model:

Assuming uniform distribution of reads, C=10 results in 1 gapped region /1,000,000 nucleotides

CSlide14

Read Mapping

BWASlide15

Paired-End Read Mapping

Reference

Physical Coverage: 4

Sequence Coverage: 2

Physical coverage refers to the genomic coverage including the

unsequenced

regions of each DNA fragment

Sequence coverage refers to the genomic coverage counting only the sequenced part of each DNA fragment

Increased gap length between paired reads provides higher physical coverage without incurring increased costs for sequencing, which is useful for detecting certain types of mutationsSlide16

Factors that effect mutation signalLimited genetic material (lower depth)Mixture of tumor and normal tissueCancer Heterogeneity

Factors that introduce noise

Formalin-fixed and Paraffin-embedded samples

Increased number of mutations and unusual genomic rearrangements

General ConsiderationEach individual has many unique mutations that could be confused with cancer causing mutations

Considerations for Cancer SequencingSlide17

Human Genome VariationSNP

TGC

T

GAGA

TGCCGAGA

Novel SequenceTGC

TCGGAGATGC - - - GAGA

Inversion

Mobile Element or

Pseudogene

Insertion

Translocation

Tandem Duplication

Microdeletion

TGC

- -

AGA

TGCCGAGA

Transposition

Large Deletion

Novel Sequence

at Breakpoint

TGCSlide18

Variant Types

Variant Types

Single

Nucleotide

Variants(SNVs

)Small Insertion

/ Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel SequenceSlide19

SNV Calling

Variant Types

Single

Nucleotide

Variants(SNVs

)Small Insertion

/ Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

A

b

ayesian

a

pproach is the most general and common method of calling SNVs

MAQ,

SOAPsnp

, Genome

Analyis

ToolKit

(GATK),

SAMtoolsSlide20

SNV Calling

Variant Types

Single

Nucleotide

Variants(SNVs

)Small Insertion

/ Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

http://

www.broadinstitute.org

/

gatk

//events/2038/GATKwh0-BP-5-Variant_calling.pdfSlide21

SNV Calling

Variant Types

Single

Nucleotide

Variants(SNVs

)Small Insertion

/ Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

A given human genome (

germline

) differs from the reference genome at millions of positions.

A cancer genome differs from the healthy genome of its host by tens of thousands of positions at most, which is several orders of magnitude fewer differences than

germline

versus reference

How do we distinguish

germline

mutations from somatic mutations?Slide22

Somatic S

NV calling

Tumor Tissue

Normal Tissue

Compare the

alignment results

Most naïve: use a standard SNV caller on both datasets. If there is a mutation found in the tumor sample but not the normal, it is somatic!Slide23

Somatic SNV calling

Roth

, A. et al.

JointSNVMix

: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics 28, 907–13 (2012).

JointSNVMix

probabilistic graphical models for joint tumor-normal SNV callingSlide24

Short Indel Calling

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion / Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

Reference

Deletion

InsertionSlide25

Short Indel Calling

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion / Deletion (indels)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

Reference

Deletion

Insertion

Reference

Read mapping

i

n practice

Unmappable

part of read (just the read end)

Unmapped read (could not be aligned

anywhere)Slide26

Short Indel Calling – Discordant Reads Pairs

II) Deletion

I

) Insertion

i

d

l

l

- i

l

+

d

l

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short

Insertion

/ Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

Reference

ReferenceSlide27

Short Indel Calling – Split Read Mapping

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion / Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

Reference

Reference

Deletion

Read mapping

i

n practiceSlide28

Short Indel Calling – Split Read Mapping

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion / Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel Sequence

Reference

Reference

Deletion

Read mapping

i

n practice

Remap each end of the

suspicious reads Slide29

Paired-end mapping can improve power to detect variants without need for more sequencing

Modified from

Meyerson

et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696Slide30

Copy Number Variants

Ref:

A B C D E F

G H I K

A B C D C E F

G H C I K

A B C D

C

E F

G H

C

I K

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion

/ Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel SequenceSlide31

Copy Number Variants

Ref:

A B C D E F

G H I K

A B C D

C

E F

G H

C

I K

C

C

C

C

Depth of Coverage

Modified from

Dalca

and

Brudno

. 2010. Genome variation discovery with high-throughput sequencing data. Briefings in bioinformatics 11, no. 1: 3-14

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion

/ Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel SequenceSlide32

Problems with DOC Very sensitive to stochastic variance in coverageSensitive to bias coverage (e.g. GC content).

Impossible to determine non-reference locations of

CNVs

Graph methods using paired-end reads help overcome some of these problems

Copy Number Variants

Ref:

A B C D E F

G H I K

A B C D

C

E F

G H

C

I K

C

C

C

C

Depth of Coverage

Variant Types

Single

Nucleotide

Variants(SNVs

)

Small Insertion

/ Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel SequenceSlide33

Copy Number Variants - CNAnorm

Gusnanto

, A., Wood, H. M.,

Pawitan

, Y., Rabbitts, P. & Berri, S. Correcting for cancer genome size and tumour cell content enables better estimation of copy number alterations from next-generation sequence data. Bioinformatics 28, 40–7 (2012).

Overall steps in

CNAnorm method, a tool for detecting copy number changes in tumor samples

Data

:

s

equence data from tumor and normal samples

Steps

:

Count number of reads in fixed windows across the genome

Calculate ratio of reads in tumor vs. reads in normal for each window, correcting for sequence biases (e.g. GC)

Smooth ratio signal across windows

Normalize data

Estimate amount of normal contamination in tumor sample

Perform segmentation on tumor dataSlide34

Variant Types

Ref:

A B C D E F

G H I K

1 2 3

4 5 6 7 8

4

G I K

1 2 3

1 2

4

3

5 6 7 8

Structural Rearrangement

Translocation

3 2 1

5 6 7 8

Inversion

1 3

5

9

6 7 8

Large Insertion / Deletion

^

2

Variant Types

Single

Nucleotide

Variants(SNVs

)

Short Insertion

/ Deletion (

indels

)

Copy Number Variants (

CNVs

)

Structural Variants (

SVs

)

Novel SequenceSlide35

Summary of Variant Types

Meyerson

et al. . 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696Slide36

Passenger Mutations and Driver Mutations

Normal

Cancer

X

X

Driver or

Passenger?

Greaves

, M. &

Maley

, C. C. Clonal evolution in cancer. Nature 481, 306–13 (2012).

SequenceSlide37

Passenger Mutations and Driver Mutations

Stratton, Michael R, Peter J Campbell, and P Andrew

Futreal

. 2009. The cancer genome. Nature 458, no. 7239 (April): 719-24. doi:10.1038/nature07943Slide38

Passenger Mutations and Driver MutationsDistinguishing Features

Presence in many tumors

Predicted to have functional impact on the cell

Conserved

Not seen in healthy adults (rare)Predicted to affect protein structureIn pathways known to be involved in cancer

Train Classifier using Machine Learning Approaches

Carter et al. 2009. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer research, no. 16: 6660-6667Slide39

Tracking the Evolution of CancerSlide40

Models of Breast Cancer Progression Slide41

Models of Breast Cancer Progression Slide42

What we did

Cancer phylogenetics

Lineage relationship of neoplastic lesions with cancers using somatic SNVs as lineage markers

Order of genomic events and drivers

Slide Courtesy of

Arend SidowSlide43

Samples

P1

P2

P3

P4

P5

P6

Lymph

Normal

CCL

FEA

DCIS

IDC

Side 1

Side 2

All samples are FFPE material

Slide Courtesy of

Arend

SidowSlide44

SamplesSlide45

Patient 1 Evolution – SNVs

Code:

0101

GGATAG

ATAG

C

G

G

C

GTCC

TAG

C

GT

CCATGG

CATGGC

CATGGC

GGCAAA

Normal sample

Early

n

eoplasia

(EN)

s

ample

EN with

atypia

(ENA) sample

Invasive ductal carcinoma

(IDC) sample

GGATAGTGTCCATGGCAAA

Reads from

s

equencing patient

s

ample

Human genome referenceSlide46

Patient 1 Evolution – SNVs

Code:

0101

Normal sample

Early

n

eoplasia

(EN)

s

ample

EN with

atypia

(ENA) sample

Invasive ductal carcinoma

(IDC) sample

Reads from

s

equencing patient

s

ample

Human genome reference

C

C

C

Multisample

SNV Code

1 0 0 0

Normal

EN

ENA

IDCSlide47

Patient 1 Evolution – SNVs

Code:

0101

Normal sample

Early

n

eoplasia

(EN)

s

ample

EN with

atypia

(ENA) sample

Invasive ductal carcinoma

(IDC) sample

Reads from

s

equencing patient

s

ample

Human genome reference

Multisample

SNV Code

0

1

1

0

Normal

EN

ENA

IDC

C

C

C

C

C

CSlide48

Patient 1 Evolution – SNVs

Code

Normal

EN

ENA

IDC

SUM

1000

89

89

0100

147

147

0010

102

102

0001

46

46

0011

755

755

755Slide49

Patient 1 Evolution – SNVs

Code

Normal

EN

ENA

IDC

SUM

1000

89

89

0100

147

147

0010

102

102

0001

46

46

0011

755

755

755

755

46

102

89

147

Normal

EN

ENA

IDC

Venn diagram

v

iewSlide50

Patient 1 Evolution – SNVs

Code

Normal

EN

ENA

IDC

SUM

1000

89

89

0100

147

147

0010

102

102

0001

46

46

0011

755

755

755

755

46

102

89

147

Normal

EN

ENA

IDC

P1Slide51

Patient 6 Evolution - SNVs

Lymph

CCL_CL

CCL

FEA

DCIS

IDC

SUM

010000

219

219

001000

305

305

000100

345

345

000010

978

978

000001

608

608

000101

61

61

61

000110

185

185

185

000111

510

510

510

510

01XXXX

0

010101

0

000011

0

3211Slide52

Lineage conceptsSomatic changesSlide53

Cell Divisions in One Generation

~40 cell

divisions

~60 new

point

mutationsSlide54

“Germline”

D?

You

Your dadSlide55

“Germline”

You

Your dad

Mutations detected here ...

... but not here ...

D?Slide56

Somatic (Tumor) Lineages

Sampled lesionSlide57

Somatic (Tumor) Lineages

Sampled lesionSlide58

Somatic (Tumor) Lineages

Sampled lesion 1

Sampled lesion 2Slide59

Patient 6 Evolution - SNVsSlide60

Patient 6 Evolution

Slide Courtesy of

Arend

SidowSlide61

Patient 6 Evolution – Copy Number Changes

Slide Courtesy of

Arend

SidowSlide62

AneuploidiesPutting the germline SNPs to good use (no somatic SNVs for this!)Slide63

Heterozygous Positions

G

A

T

m

A

T

C

p

50%

50%

50%Slide64

LOH (e.g., paternal chromosome)

G

A

T

m

A

T

C

p

0%

0%

0%

Fraction

of “

lesser allele

”Slide65

Chromosome Duplication

G

A

T

m

A

T

C

p

A

T

C

66%

66%

66%Slide66

Chromosome Duplication

G

A

T

m

A

T

C

p

A

T

C

33%

33%

33%

Fraction

of “

lesser allele

”Slide67

Lesser Allele Fraction Plots

Chromosome

Lesser allele fraction

Running number of germline

het SNP (N ~ 1.7 million)

Plots are windows of 1000 SNPs, overlapping by 500Slide68

Lesser Allele Fraction PlotsSlide69

Zoom-In

But ... What is the actual ploidy?Slide70

Absolute Coverage Pattern of LOH and Gain

Normal

LOH

GainSlide71

Absolute Coverage in LOH vs Ploidy Gain

Prevalent allele absolute coverage

Lesser allele absolute coverage

Lesser allele FRACTION

LOH

Gain

?Slide72

Fractions with normal contribution

G

A

Prevalent allele absolute coverage = 7

Lesser allele absolute coverage = 0

G

A

Prevalent allele absolute coverage = 14

Lesser allele absolute coverage = 7

Our samples: up to 50% normal (non-tumor) tissue content

Lesser allele fraction = 7/21 = 0.33Slide73

Patient 6 Evolution – Copy Number Changes

Slide Courtesy of

Arend

SidowSlide74

Patient 6 Evolution

Slide Courtesy of

Arend

SidowSlide75

Patient 2

Lymph

Normal

CCL

FEA

IDC

DCISSlide76

Patient 2 - normalSlide77

Patient 2 – CCL and DCIS

16q: 1N (LOH)

1q: 4N (3:1)

X: 1N (LOH)

16p: 3N (2:1)Slide78

Patient 2 – IDC has same as CCL,DCISSlide79

Patient 2 Aneuploidy Evolution

CCL,DCIS

IDC

1q,16p

16q, XSlide80

Patient 2 - IDCSlide81

Patient 2 Aneuploidy Evolution

CCL,DCIS

IDC

IDC’

1q,16p

16q, X

Major Crisis

Involving all but 6 chromosomes, including 10 whole-chromosome LOHs

No aneuploidies but ...Slide82

Patient 2 – SNVs

515

5

70

0

681

80

133

Allele

Freq

in CCL

CCL:

894

IDC:

1276

DCIS:

884Slide83

Patient 2 Evolution

1q,16p

16q, X

1p

2

4

58910

11

13

14

15

17

19

21

681

133

80

515Slide84

Patient Cancer Phylogeny Trees

Slide Courtesy of

Arend

SidowSlide85

Mutational ProfilesSlide86

Branched tree model

Victoria

Popic

Automated Inference of

Multi-Sample Cancer Phylogenies

Raheleh

Salari

Sample 1

Sample 3

SMutH

: Somatic Mutation Hierarchies

Sample 2Slide87

VAF profiles of SNVs across samplesSlide88

VAF profiles of SNVs – Clustering Slide89

Edge u

 v :

Cell-Lineage VAF Constraint

u

v

“Possibly mutations in u

h

appened before those in v”Slide90

For each node u

and its children C :

Tree Construction

Find all spanning trees that satisfy VAF constraints

(extension of

Gabow&Myers

spanning tree search algorithm)

Rank trees according to their agreement with VAFs

u

v

wSlide91

Simulation Results

Pred

: pairs of nodes ordered correctly

Branch: pairs of nodes correctly assigned to separate branches

Shared edges: edges shared between true and reconstructed trees

u

v

w

u

v

w

z

y

xSlide92

ccRCC Study of Renal Carcinoma by Gerlinger et.

a

l

(2014)

HGSC Study of Ovarian Cancer

Bashashati

et. al

(2013)

Reconstruction of Lineage Trees in Recent LiteratureSlide93

PIK3CA H1047R

PIK3CA

H1047L

Expanded Breast Cancer Lineage TreesSlide94

Fantastic Cancer ReviewsHanahan and Weinberg. 2000. The hallmarks of cancer. Cell 100: 57-70.

Hanahan

and Weinberg. 2011.

Hallmarks of cancer: the next generation. Cell 144, 646–

74.Reviews of Cancer GenomicsMeyerson, Matthew, Stacey Gabriel, and Gad Getz. 2010. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics 11, no. 10 (October): 685-696. doi:10.1038/nrg2841. http://www.nature.com/doifinder/10.1038/nrg2841.Yates

, L. R. & Campbell, P. J. Evolution of the cancer genome. Nat. Rev. Genet. 13, 795–806 (2012).Variant CallingDalca, Adrian V, and Michael Brudno

. 2010. Genome variation discovery with high-throughput sequencing data. Briefings in bioinformatics 11, no. 1 (January): http://www.ncbi.nlm.nih.gov/pubmed/20053733.Medvedev, Paul, Monica

Stanciu

, and Michael

Brudno

. 2009. Computational methods for discovering structural variation with next-generation sequencing. nature methods 6, no. 11 http://www.nature.com/nmeth/journal/v6/n11s/full/nmeth.1374.html.

Further Readings for the Curious