/
Proteogenomics Proteogenomics

Proteogenomics - PowerPoint Presentation

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
506 views
Uploaded On 2016-08-01

Proteogenomics - PPT Presentation

Kelly Ruggles PhD Proteomics Informatics Week 9 As the cost of highthroughput genome sequencing goes down whole genome exome and RNA sequencing can be easily attained for most proteomics experiments ID: 429081

gene exon protein genome exon gene genome protein peptides mapping database genomic proteogenomics proteomics identification sequencing rna variants annotation

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Proteogenomics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Proteogenomics

Kelly Ruggles, Ph.D.

Proteomics Informatics

Week 9Slide2

As the cost of high-throughput genome sequencing goes down whole genome,

exome

and RNA sequencing can be easily attained for most proteomics experiments

In combination with mass spectrometry-based proteomics, sequencing can be used for:

Genome annotationStudying the effect of genomic variation in proteomeBiomarker identification

Proteogenomics: Intersection of proteomics and genomicsSlide3

Proteogenomics: Intersection of proteomics and genomics

First published on in 2004 “Proteogenomic mapping as a complementary method to perform genome annotation” (Jaffe JD, Berg HC and Church GM) using genomic sequencing to better annotate

Mycoplasma

pneumoniae

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011Slide4

Proteogenomics

In the past, computational algorithms were commonly used to predict and annotate genes.

Limitations: Short genes are missed, alternative splicing prediction difficult, transcription vs. translation (cDNA predictions)

With mass spectrometry we can Confirm existing gene modelsCorrect gene modelsIdentify novel genes and splice isoforms

Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011Essentials for ProteogenomicsSlide5

Proteogenomics

Genome annotation

Studying the effect of genomic variation in proteome

Proteogenomic mappingSlide6

Proteogenomics

Genome annotation

Studying the effect of genomic variation in proteome

Proteogenomic mappingSlide7

Proteogenomics Workflow

Krug K.,

Nahnsen

S,

Macek B, Molecular Biosystems 2010Renuse S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011Slide8

Protein Sequence Databases

Identification of peptides from MS relies heavily on the quality of the protein sequence database (DB)

DBs with missing peptide sequences will fail to identify the corresponding peptides

DBs that are too large will have low sensitivityIdeal DB is complete and small, containing all proteins in the sample and no irrelevant sequencesSlide9

Genome Sequence-based database for genome annotation

Reference

protein DB

Compare, score, test significance

annotated peptides6 frame translation of genome sequence

Compare, score, test significance

annotated +

n

ovel

peptides

m/z

intensity

MS/MSSlide10

Creating 6-frame translation database

ATGAAAAGCCTCAGCCTACAGAAACTCTTTTAATATGCATCAGTCAGAATTTAAAAAAAAAATC

M

K

SLSLQ

K

L

F

*

Y

A

S

V

R

I

*

K

K

N

*

K

A

S

A

Y

R

N

S

F

N

M

H

Q

S

E

F

K

K

K

I

E

K

P

Q

P

T

E

T

L

L

I

C

I

S

Q

N

L

K

K

K

S

H

F

A

E

A

*

L

F

E

K

L

I

C

*

D

S

N

L

F

F

I

S

F

G

*

G

V

S

V

R

K

I

H

M

L

*

F

K

F

F

F

D

F

L

R

L

R

C

F

S

K

*

Y

A

D

T

L

I

*

F

F

F

G

Positive Strand

Negative Strand

Software:

Peppy

: creates the database + searches MS, Risk BA, et. al (2013)

BCM Search Launcher

: web-based Smith et al., (1996)

InsPecT

:

perl

script Tanner et. al, (2005)Slide11

Genome Annotation Example 1:

A.

gambiae

Renuse

S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011Peptides mapping to annotated 3’ UTRPeptides mapping to novel exon within an existing geneSlide12

Genome Annotation Example 1:

A.

gambiae

Renuse

S, Chaerkady R and A Pandey, Proteomics. 11(4) 2011Peptides mapping to unannotated generelated strainSlide13

Armengaud

J,

Curr

. Opin Microbiology 12(3) 2009

Genome Annotation Example 2: Correcting Miss-annotationscurrently annotated genespeptide mapping to nucleic acid sequencemanual validation of miss-annotationA. Hypothetical protein confirmedB. Confirm unannotated geneC. Initiation codon is downstreamD. Initiation codon is upstream E. Peptides indicate the gene frame is wrongF. Peptides indicate that gene on wrong strandG. In frame stop-codon or frameshift foundSlide14

RNA Sequence-based database for alternatively splicing identification

RNA-Seq junction DB

Compare, score, test significance

Identification of novel splice isoforms

m/z

intensity

MS/MSSlide15

Annotation of organisms which lack genome sequencing

Compare, score, test significance

Identification of potential protein coding regions

Reference DB of related species

m/z

intensity

MS/MS

De novo

MS/MS sequencingSlide16

Proteogenomics: Genome Annotation Summary

Renuse

S,

Chaerkady

R and A Pandey, Proteomics. 11(4) 2011Slide17

Proteogenomic Genome Annotation Summary

Renuse

S,

Chaerkady

R and A Pandey, Proteomics. 11(4) 2011Slide18

Proteogenomics

Genome annotation

Studying the effect of genomic variation in proteome

Proteogenomic mappingSlide19

Single nucleotide variant database for variant protein identification

Compare, score, test significance

Identification of variant proteins

m/z

intensity

MS/MS

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGATAGCTG

Exon

1

Variants predicted from genome sequencing

Reference

protein DB

+

Variant DBSlide20

Creating variant sequence DB

VCF File Format

# Meta-information lines

Columns:

ChromosomePositionID (ex: dbSNP)Reference base Alternative allele Quality scoreFilter (PASS=passed filters)Info (ex: SOMATIC, VALIDATED..)Slide21

Creating variant sequence DB

…GTATTGCAAAAATAAGATAGAATAAGAATAATTACGACAAGATTC…

…CTATTGCAAAAATACGATAGCATAAGAATAGTTACGACAAGATTC…Add in variants within exon boundariesIn silico translation

EXON 1

EXON2

…LLQKYD

S

IRI

V

TTRF…

Variant DBSlide22

Splice junction database for novel exon, alternative splicing identification

Compare, score, test significance

Identification of novel splice proteins

m/z

intensity

MS/MS

Intron/Exon boundaries from RNA sequencing

Reference

protein DB

+

RNA-Seq

junction

DB

Exon

1

Exon

2

Exon

3

Alt. Splicing

Novel Expression

Exon

1

Exon X

Exon 2Slide23

Creating splice junction DB

BED File Format

Columns:

ChromosomeChromosome StartChromosome End

Name ScoreStrand (+or-)7-9. Display info10. # blocks (exons)11. Size of blocks12. Start of blocks Slide24

Creating splice junction DB

Junction bed file

Map to known

intron/exon boundaries

Exon 1Exon 2

1. Annotated Splicing

2. Unannotated alternative splicing

3. One end matches,

one within exon

4. One end matches,

one within intron

5. No matching exons

Bed file with

n

ew gene

mapping

Intronic

region

Exon

1

Exon

2

Exon 3

Exon

1

Exon

2

Exon

1

Exon

2Slide25

Fusion protein identification

Compare, score, test significance

Identification of variant proteins

m/z

intensity

MS/MS

Reference

protein DB

+

Fusion Gene

DB

Gene X

Exon

1

Gene X

Exon

2

Gene Y

Exon

1

Gene Y

Exon

2

Chr

1

Chr

2

Gene X

Exon

1

Gene Y

Exon

2Slide26

Fusion Genes

Fusion Location

.…AGAACTGGAAGAATTGG*AATGGTAGATAACGCAGATCATCT..…

Find consensus sequence

6 frame translation FASTASlide27

Informatics tools for customized DB creation

QUILTS

: perl/python based tool to generate DB from genomic and RNA sequencing data (Fenyo lab)

customProDB: R package to generate DB from RNA-Seq data (Zhang B, et al.)Splice-graph database creation (

Bafna V. et al.)Slide28

Proteogenomics and Human Disease: Genomic Heterogeneity

Whole

genome sequencing has uncovered millions of germline variants between

individuals

Genomic, proteome studies typically use a reference database to model the general population, masking patient specific variation

Nature

October 28, 2010Slide29

Proteogenomics and Human Disease:

Cancer Proteomics

Cancer is characterized by altered expression of tumor drivers and suppressors

Results from gene mutations causing changes in protein expression, activityCan influence diagnosis, prognosis and treatment

Cancer proteomics Are genomic variants evident at the protein level?What is their effect on protein function?Can we classify tumors based on protein markers?Slide30

Tumor Specific Proteomic Variation

Stephens, et al. Complex landscape of somatic rearrangement in human breast cancer genomes.

Nature

2009

Nature April 15, 2010Slide31

Personalized Database for Protein Identification

m/z

intensity

MS/MS

Protein DB

Compare, score, test significance

Somatic Variants

SVATGSSEAAGGASGGGAR

GQVAGTMKIEIAQYR

DSGSYGQSGGEQQR

EETSDFAEPTTCITNNQHS

EPRDPR

FIKGWFCFIISAR….

Germline Variants

MQYAPNTQVEIIPQGR

SSAEVIAQSR

ASSSIIINESEPTTNIQIR

QRAQEAIIQISQAISIMETVK

SSPVEFECINDK

SPAPGMAIGSGR…

Identified peptides and proteinsSlide32

Personalized Database for Protein Identification

m/z

intensity

MS/MS

Tumor Specific Protein DB

Compare, score, test significance

+ tumor specific

+ patient specific peptides

RNA-Seq

Genome Sequencing

Identified peptides and proteinsSlide33

Tumor Specific Protein Databases

Tumor Specific

Protein DB

Non-Tumor Sample

Genome sequencingIdentify germline variants

Reference Human Database (Ensembl)

Genome sequencing

RNA-Seq

Tumor Sample

Identify alternative splicing,

somatic variants and

novel expression

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGA

G

AGCTG

TCGATAGCTG

Exon

1

Exon

2

Exon

3

Exon

1

Variants

Alt. Splicing

Novel Expression

Exon

1

Exon X

Exon 2

Fusion Genes

Gene X

Exon

1

Gene X

Exon

2

Gene Y

Exon

1

Gene Y

Exon

2

Gene X

Gene YSlide34

Proteogenomics and Biomarker Discovery

T

umor-specific peptides identified by MS can be used as sensitive drug targets or diagnostic tools

Fusion proteinsProtein isoformsVariants Effects of genomic rearrangements on protein expression can elucidate cancer biology Slide35

Proteogenomics

Genome annotation

Studying the effect of genomic variation in proteome

Proteogenomic mappingSlide36

Proteogenomic mapping

Map back observed peptides to their genomic location.

Use to determine:

Exon location of peptidesProteotypicNovel coding regionVisualize in genome browsersQuantitative comparison based on genomic locationSlide37

Informatics tools for proteogenomic mapping

PGx

: python-based tool, maps peptides back to genomic coordinates using user defined reference database (Fenyo lab)

The Proteogenomic Mapping Tool: Java-based search of peptides against 6-reading frame sequence database (Sanders WS, et al). Slide38

PGX: Proteogenomic mapping tool

Peptides

Sample specific protein database

Peptides mapped onto genomic coordinates

Manor Askenazi David FenyoLog Fold Change in Expression (10,000 bp bins)

Copy Number Variation

Methylation Status

Exon Expression (RNA-Seq)

Number of Genes/Bin

PeptidesSlide39

Variant Peptide Mapping

SVATGSSE

A

AGGASGGGARSVATGSSET

AGGASGGGARExon SkippingUnannotated ExonsACG->GCGPeptides with single amino acid changes corresponding to germline and somatic variantsENSEMBL Gene

Tumor Peptide

Reference PeptideSlide40

Novel Peptide Mapping

Peptides corresponding to RNA-Seq expression in non-coding regions

ENSEMBL Gene

Tumor Peptide

Tumor RNA-SeqSlide41

Proteogenomic integration

Maps genomic, transcriptomic and proteomic data to same coordinate system including quantitative information

Variants

Proteomic Quantitation

RNA-Seq Data Proteomic Mapping

Predicted gene expressionSlide42

Questions?

Related Contents


Next Show more