/
Previous Lecture:  Regression and Correlation Previous Lecture:  Regression and Correlation

Previous Lecture: Regression and Correlation - PowerPoint Presentation

carla
carla . @carla
Follow
342 views
Uploaded On 2022-06-18

Previous Lecture: Regression and Correlation - PPT Presentation

Introduction to Biostatistics and Bioinformatics Proteomics Informatics This Lecture Proteomics Informatics Learning Objectives Structure of m ass spectrometry data Protein identification ID: 920193

intensity mass protein charge mass intensity charge protein peptide tandem 1166 spectrometry 1020 sample ions 778 292 663 907

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Previous Lecture: Regression and Correl..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Previous Lecture:

Regression and Correlation

Slide2

Introduction to Biostatistics and Bioinformatics

Proteomics Informatics

This Lecture

Slide3

Proteomics Informatics – Learning Objectives

Structure of

m

ass spectrometry data

Protein identification

Protein quantitation

Slide4

Protein Identification and Quantitation

by Mass Spectrometry

Mass

Spectrometry

m/z

intensity

Identity

Quantity

Samples

Peptides

Slide5

Mass spectrometry

LysisFractionation

Sample preparation for protein identification, characterization and quantitation

Digestion

Slide6

Overview of Mass spectrometry

Ion Source

Mass Analyzer

Detector

mass/charge

intensity

Slide7

Mass Spectrometry (MS)

Slide8

Example data – MALDI-TOF

Peptide intensity vs m/z

Slide9

Peptide Fragmentation

Mass Analyzer 1

Frag-mentation

Detector

Ion Source

Mass Analyzer 2

b

y

Slide10

Liquid Chromatography (LC)-MS/MS

Mass Analyzer 1

Frag-mentation

Detector

intensity

mass/charge

Ion Source

Mass Analyzer 2

LC

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

Time

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

intensity

mass/charge

Slide11

Fragment intensity vs m/z

Example data – ESI-LC-MS/MS

Time

m/z

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

MS/MS

Peptide intensity vs m/z vs time

Slide12

Charge-State Distributions

mass/charge

intensity

MALDI

ESI

mass/charge

intensity

1+

1+

2+

3+

4+

Peptide

Protein

2+

M - molecular mass

n - number of charges

H – mass of a proton

mass/charge

intensity

mass/charge

intensity

1+

27+

2+

3+

4+

MALDI

ESI

5+

31+

Slide13

Charge-State

Example:

peptide of mass 898 carrying 1 H+ = (898 + 1) / 1 = 899 m/z

carrying 2 H

+

= (898 + 2) / 2 = 450 m/z

carrying 3 H

+

= (898 + 3) / 3 = 300.3 m/z

M - molecular mass

n - number of charges

H – mass of a proton

Slide14

Isotope Distributions

m/z

m/z

m/z

Intensity

0.015%

2

H

1.11%

13

C

0.366%

15

N

0.038%

17

O, 0.200%

18

O,

0.75%

33

S, 4.21%

34S, 0.02% 36

SOnly 12

C and 13C:p=0.0111n is the number of C in the peptidem is the number of

13C in the peptideTm is the relative intensity of the

peptide m 13C

 

12

C

14

N

16

O1H32S

+1Da

+2Da+3Da

Slide15

Isotope Clusters and Charge State

m/z

Intensity

1+

1

1

1

m/z

Intensity

2+

0.5

0.5

0.5

m/z

Intensity

3+

0.33

0.33

0.33

Slide16

432.8990

433.2330

433.5671

433.9014

713.3225

713.8239

714.3251

714.8263

What is the Charge State?

between the isotopes is 0.5

Da

between the isotopes is 0.33

Da

Slide17

Protein Identification

by Mass Spectrometry

Mass

Spectrometry

m/z

intensity

Identity

Samples

Peptides

Slide18

Protein Identification

- Exercise

1. Protein

identification: NUP1 was

genomically

tagged protein A, affinity purified under two conditions, and the resulting protein mixture was analyzed with liquid chromatography mass spectrometry (LC-MS). Search the resulting spectra (NUP1-less-stringent-wash.mgf, NUP1-more-stringent-wash.mgf) using X! Tandem (

http://h.thegpm.org/tandem/thegpm_tandem.html

). Change the taxon to “S.

cerevisiae

(budding yeast)” but otherwise keep the default parameter settings.

a. Look

at the list of identified proteins and explain why they are found in this sample. More information is also available by selecting the “go”, “path”, “

ppi

”, “

doms

”, “string” tabs on top of the page

.

b. Select the “

mh

” display on top right of the page, and zoom in to +/-100 ppm (the default setting for the mass accuracy that was used in the search). What precursor mass accuracy should we have used? Zoom in further and determine what precursor mass accuracy could have been used if the spectra were recalibrated (the error distribution centered at zero).

Slide19

Identification – Tandem MS

Slide20

m/z

% Relative Abundance

100

0

250

500

750

1000

Tandem MS – Sequence Confirmation

K

L

E

D

E

E

L

F

G

S

Slide21

K

1166

L

1020

E

907

D

778

E

663

E

534

L

405

F

292

G

145

S

88

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

K

L

E

D

E

E

L

F

G

S

Tandem MS – Sequence Confirmation

Slide22

147

K

1166

L

260

1020

E

389

907

D

504

778

E

633

663

E

762

534

L

875

405

F

1022

292

G

1080

145

S

1166

88

y ions

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

K

L

E

D

E

E

L

F

G

S

Tandem MS – Sequence Confirmation

Slide23

147

K

1166

L

260

1020

E

389

907

D

504

778

E

633

663

E

762

534

L

875

405

F

1022

292

G

1080

145

S

1166

88

y ions

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

K

L

E

D

E

E

L

F

G

S

Tandem MS – Sequence Confirmation

Slide24

147

K

1166

L

260

1020

E

389

907

D

504

778

E

633

663

E

762

534

L

875

405

F

1022

292

G

1080

145

S

1166

88

y ions

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

K

L

E

D

E

E

L

F

G

S

Tandem MS – Sequence Confirmation

Slide25

147

K

1166

L

260

1020

E

389

907

D

504

778

E

633

663

E

762

534

L

875

405

F

1022

292

G

1080

145

S

1166

88

y ions

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

113

K

L

E

D

E

E

L

F

G

S

113

Tandem MS – Sequence Confirmation

Slide26

147

K

1166

L

260

1020

E

389

907

D

504

778

E

633

663

E

762

534

L

875

405

F

1022

292

G

1080

145

S

1166

88

y ions

b ions

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

129

129

K

L

E

D

E

E

L

F

G

S

Tandem MS – Sequence Confirmation

Slide27

Tandem MS – de novo Sequencing

m/z

% Relative Abundance

100

0

250

500

750

1000

[M+2H]

2+

762

260

389

504

633

875

292

405

534

907

1020

663

778

1080

1022

Mass Differences

Amino acid masses

Sequences

consistent

with spectrum

Slide28

Tandem MS – de novo Sequencing

Slide29

Tandem MS – de novo Sequencing

Slide30

Tandem MS – de novo Sequencing

X

X

X

X

X

X

…GF(I/L)EEDE(I/L)…

…(I/L)EDEE(I/L)FG…

…GF(I/L)EEDE(I/L)…

…(I/L)EDEE(I/L)FG…

Peptide M+H = 1166

1166 -1079 = 87 => S

S

GF(I/L)EEDE(I/L)…

S

GF(I/L)EEDE(I/L)…

1166 – 1020 – 18 = 128

K or Q

SGF(I/L)EEDE(I/L)(

K/Q

)

Slide31

Tandem MS – de novo Sequencing

Challenges in de novo sequencing

Neutral loss (-H2O, -NH3)

Modifications

Background peaks

Incomplete information

Challenges in de novo sequencing

Neutral loss (-H

2

O, -NH

3

)

Modifications

Background peaks

Incomplete information

Slide32

MS/MS

Lysis

Fractionation

Tandem MS – Database Search

MS/MS

Digestion

Sequence

DB

All Fragment

Masses

Pick Protein

Compare, Score, Test Significance

Repeat for all proteins

Pick Peptide

LC-MS

Repeat for

all peptides

Slide33

S.

cerevisiae

Human

Information Content in a Single Mass Measurement

Tryptic

peptide mass [

Da

]

1000 2000 3000

Tryptic

peptide mass [

Da

]

1000 2000 3000

Avg. #of matching peptides

#of matching peptides

1 2 3 4 6 8 10

10

8

6

4

3

2

1

Avg. #of matching peptides

10

8

6

4

3

2

1

#of matching peptides

1 2 3 4 6 8 10

Slide34

Protein Identification and Quantitation

by Mass Spectrometry

Mass

Spectrometry

m/z

intensity

Quantity

Samples

Peptides

Slide35

Fractionation

Digestion

LC-MS

Lysis

MS

Sample

i

Protein j

Peptide k

Protein Quantitation

by

Mass Spectrometry

Slide36

Fractionation

Digestion

LC-MS

Lysis

Quantitation

– Label-Free (MS)

MS

MS

Assumption:

constant for all samples

Sample

i

Protein j

Peptide k

Slide37

H

L

Quantitation – Metabolic Labeling

Fractionation

Digestion

LC-MS

Light

Heavy

Lysis

MS

Oda

et al. PNAS 96 (1999) 6591

Ong

et al. MCP 1 (2002) 376

Sample

i

Protein j

Peptide k

Slide38

H

L

Fractionation

Digestion

LC-MS

Light

Lysis

Synthetic

Peptides

(Heavy)

Quantitation – Labeled Synthetic Peptides

MS

Gerber et al. PNAS 100 (2003) 6940

Enrichment with

Peptide antibody

Assumption: All losses after mixing are identical for the heavy and light isotopes and

Anderson

, N.L

., et

al.

Proteomics 3 (2004)

235-44

Slide39

Estimating peptide quantity

Peak height

Curve fitting

Peak area

Peak height

Curve fitting

m/z

Intensity

Slide40

What is the best way to estimate quantity?

Peak height - resistant to interference

- poor statisticsPeak area - better statistics

- more sensitive to interference

Curve fitting - better statistics

- needs to know the peak shape

- slow

Spectrum counting - resistant to interference

- easy to implement

- poor statistics for

low-abundance proteins

Slide41

Proteomics Informatics - Summary

Structure of

m

ass spectrometry data

Protein identification

Protein quantitation

Slide42

Next Lecture:

Gene Expression

Slide43

Protein Quantitation - Exercise

2

. Protein quantitation: Two breast tumor

xenografts

(one basal and one luminal) were analyzed in by LC-MS and the spectral counts for the identified peptides in the different analyses are listed in two-sample-three-replicate-comparison.txt

.

a. Compare

replicate one of Sample 1 with replicate one of Sample 2 using proteomics_no_replicate.py. Which differences are significant

?

b. Compare replicate one and two of Sample 1 using proteomics_one_replicate.py. Compare to the distribution in 2a. Which differences are significant in 2a

?

c. Compare the three replicates of Sample 1 with the three replicates of Sample 2 using proteomics_three_replicates.py. Which differences are

significant

?

d. In cases when a protein is not observed in one sample, how many spectra do we need to observe in the other sample to say that there is a significant difference?

Slide44

Phosphorylation Exercise: an unmodified peptide

Theoretical fragment ions

Slide45

Spectrum of

the phosphorylated peptide

Slide46

Spectrum of the peptide

phosphorylated at a different site