/
Protein function and classification Protein function and classification

Protein function and classification - PowerPoint Presentation

hadley
hadley . @hadley
Follow
344 views
Uploaded On 2022-06-13

Protein function and classification - PPT Presentation

Hsin Yu Chang wwwebiacuk Classifying proteins into families and identifying protein homologues can help scientists to characterise unknown proteins Greider and Blackburn discovered telomerase in 1984 and were awarded Nobel prize in 2009 Which model organism they used fo ID: 917514

alignment protein sequences sequence protein alignment sequence sequences motif telomerase interpro domain family signatures xxxxxx signature proteins functional domains

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Protein function and classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Protein function and classification

Hsin

-Yu Chang

www.ebi.ac.uk

Slide2

Classifying proteins into families and identifying

protein homologues can help scientists

to

characterise

unknown proteins

.

Slide3

Greider and Blackburn discovered telomerase in 1984 and were awarded Nobel prize in 2009. Which model organism they used for this study ?

1.

Tetrahymena

thermophila

2.

S

accharomyces

cerevisiae

3. Mouse

4. Human

Slide4

A single

T

etrahymena

thermophila

cell has 40,000 telomeres, whereas a human cell only has 92.1984Discovery of telomerase Greider and Blackburn1989Telomere hypothesis of cell senescenceSzostak1995 Clone hTR1995/1997 Clone hTERT1997 Telomerase knockout mouse1998 Ectopic expression of telomerase in normal human epithelial cells cause the extension of their lifespan

1999/2000…Telomerase/telomere dysfunctions and cancerGilson and Ségal-Bendirdjian, Biochimie, 2010.

Slide5

Can we identify human telomerase from T

etrahymea

protein sequence?

Slide6

Let’s pretend that human telomerase has not been identified and we only know the protein sequences of

Tetrahymena

telomerase. How can we find the human telomerase?

Slide7

BLAST (Basic Local Alignment Tool)

:

compares protein

sequences to sequence databases and calculates the statistical significance of matches.

Slide8

BLAST

Advantages:

Relatively

fast

User friendlyVery good at recognising similarity between closely related sequences Drawbacks:sometimes struggle with multi-domain proteinsless useful for weakly-similar sequences (e.g., divergent homologues)

Slide9

Using

Tetrahymena

telomerase protein sequences as a query in BLAST, you will find a few human proteins that have very low identity.

Slide10

Tetrahymena

and putative human telomerase (AAC51724.1) have poor protein sequence match.

Slide11

Can we presume this protein is a telomerase homologue from humans?

Can

we

find more information about

it before pursuing it further?

Slide12

Telomerase

ribonucleoprotein

complex - RNA binding domain

Reverse

transcriptase

domainSearch for protein signatures (such as domains) in AAC51724.1

Slide13

Plan

experiments and find out more!

AAC51724.1 shares 23% identity with

Tetrahymena

telomerase. It also contains the same domains as telomerase.

Slide14

But, where can we search for information about the protein domains?

Slide15

Structural

domains

Functional annotation of families/domains

Protein features 

(sites)

Hidden Markov Models

Finger printsProfilesPatterns

Protein databases that use signature approachesHAMAP

Slide16

Construction of protein signatures

Construction of a multiple sequence alignment (MSA) from characterised protein sequences.

Modelling the pattern of conserved amino acids at specific positions within a MSA.

Use these

models to infer relationships with the characterised sequences

Slide17

Three different protein signature approaches

Patterns

Single motif methods

Fingerprints

Multiple

motif methods

Profiles &

Hidden Markov Models (HMMs)Full alignment methodsSequence alignment

Slide18

Patterns

Slide19

Patterns

Sequence alignment

Motif

Pattern signature

[AC] – x -V- x(4) - {ED}

R

egular expression

PS00000Pattern sequencesALVKLISGAIVHESATCHVRDLSCCPVESTIS

Patterns are usually directed against functional sequence features such as: active sites, binding

sites, etc.

Slide20

PDOC00199

[SAG]-G-G-T-G-[SA]-G

Tubulin

signature

A conserved motif in tubulins

Slide21

Patterns

Advantages:

Strict

-

a pattern with very little variability and can produce highly accurate matchesDrawbacks:Simple but less flexible

Slide22

Fingerprints

Slide23

Fingerprints:

a

multiple motif approach

Sequence alignment

Motif 2

Motif 3

Motif 1

Define motifsFingerprint signaturePR00000

Motif

sequences

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

xxxxxx

Weight matrices

Slide24

Telomerase signature (PR01365)

Motif 1

Motif 2

Motif 3

Motif 4

Slide25

The significance of motif context

order

interval

Identify small conserved regions in proteins

Several motifs

 characterise family1

23

Slide26

G

ood

at modeling the often small differences between closely related proteins

D

istinguish

individual subfamilies within protein families, allowing functional characterisation of sequences at a high level of specificityFingerprintsAmino acids relatively well conserved across all chloride channel protein family members Amino acids uniquely conserved in chloride channel protein 3 subfamily members.

Slide27

Profiles & HMMs

Slide28

Sequence alignment

Entire domain

Define coverage

Whole protein

Use

entire alignment of domain or protein family

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxBuild model (Profile or HMMs)

Profile or HMM signature

Profiles & HMMs

Slide29

Profiles

Start with a multiple sequence alignment

Amino acids at each position in the alignment are scored according to the frequency with which they occur

Scores are weighted according to evolutionary distance using a BLOSUM matrix

Good at identifying homologues

Slide30

HMMs

Amino acid frequency at each position in the alignment and their transition probabilities are encoded

Insertions and deletions are also modelled

Start with a multiple sequence alignment

Very good at identifying evolutionarily distant homologues

Can model very divergent regions of alignment

Advantages

Slide31

Three different protein signature approaches

Patterns

Single motif methods

Fingerprints

Multiple

motif methods

Profiles &

HMMshidden Markov models Full alignment methods

Slide32

www.ebi.ac.uk/interpro

Fingerprints

Patterns

Profiles &

HMMs

hidden Markov models

Slide33

Structural

domains

Functional annotation of families/domains

Protein features 

(sites)

Hidden Markov Models

Finger printsProfilesPatterns

HAMAP

Slide34

The aim of InterPro

Family entry:

description, proteins matched and more information.

Domain entry:

description, proteins matched and more information.

Site entry: description, proteins matched and more information. Protein sequences

Slide35

What is

InterPro

?

I

nterPro

is an integrated sequence analysis resourceIt combines predictive models (known as signatures) from different databasesIt provides functional analysis of protein sequences by classifying them into families and predicting domains and important sites

Slide36

First release in 1999

11 partner databases

Add annotation to

UniProtKB

/TrEMBL Provides matches to over 80% of UniProtKBSource of >85 million Gene Ontology (GO) mappings to >24 million distinct UniProtKB sequences50,000 unique visitors to the web site per month> 2 million sequences searched online per month. Plus offline searches with downloadable version of softwareFacts about InterPro

Slide37

Signatures are provided by member databases

They are scanned against the

UniProt

database to see which sequences they match

Curators manually inspect the matches before integrating the signatures into

InterProInterPro signature integration processInterPro curators

Slide38

InterPro

signature integration process

Signatures representing the same entity are integrated

together

Relationships between entries are traced, where

possibleCurators add literature referenced abstracts, cross-refs to other databases, and GO terms

Slide39

http://www.ebi.ac.uk/interpro/

Slide40

Search using protein sequences

Slide41

Family

Slide42

Type

Slide43

InterPro entry types

Proteins share a common evolutionary origin, as reflected in their related functions, sequences or

structure. Ex.

T

elomerase family.

FamilyDistinct functional, structural or sequence units that may exist in a variety of biological contexts. Ex. DNA binding domain.Domain

Short sequences typically repeated within a protein. Ex. Tubulin binding repeats in microtubule associated protein Tau. Repeats

PTMActive Site

Binding

Site

Conserved

Site

Sites

Ex. Phosphorylation sites, ion binding sites, tubulin conserved site.

Slide44

Type

Name

Identifier

Contributing signatures

Description

GO termsReferences

Slide45

Slide46

Slide47

Slide48

Slide49

Type

Name

Identifier

Contributing signatures

Description

ReferencesRelationships

Slide50

InterPro

family and domain relationships

Slide51

Family relationships in InterPro

:

Interleukin-15/Interleukin-21 family

(

IPR003443)

Interleukin-15 (IPR020439)Interleukin-15Avian (IPR020451)Interleukin-15Fish(IPR020410)Interleukin-15Mammal(IPR020466)

Interleukin-21(IPR028151)

Slide52

Relationships

Slide53

InterPro

relationships: domains

Protein kinase-like

domain

Protein kinase

domain

Serine/threoninekinase catalyticdomain

Tyrosinekinase catalyticdomain

Slide54

Slide55

Gene Ontology

Allow

cross-species and/or cross-database

comparisons

Unify the representation of gene and gene product attributes across species

Slide56

The Concepts in GO

1. Molecular Function

2. Biological Process

3. Cellular Component

protein kinase activity

insulin receptor activity

Cell cycleMicrotubule cytoskeleton organisation

Slide57

GO:0003677 DNA binding

GO:0003721

telomeric

template RNA

reverse transcriptase activityGO:0005634 Nucleus

Slide58

Search using keywords

Slide59

Slide60

Slide61

Summary

Protein

classification could help scientists to gain information about protein functions.

Blast is fast and easy to use but has its drawbacks.

Alternative approach: protein signature

databases build models (protein signatures) by using different methods (patterns, fingerprints, profile and HMMs).InterPro integrates these signatures from 11 member databases. It serves as a sequence analysis resource that classifies sequences into protein families and predicts important domains and sites.

Slide62

Slide63

Why use

InterPro

?

Large amounts of manually curated data

35,634

signatures integrated into 25,214 entriesCites 38,877 PubMed publicationsLarge coverage of protein sequence spaceRegularly updated~ 8 week release scheduleNew signatures addedScanned against latest version of UniProtKB

Slide64

Caution

We need your feedback!

missing/additional references

reporting problems

requests

InterPro is a predictive protein signature database - results are predictions, and should be treated as such InterPro entries are based on signatures supplied to us by our member databases....this means no signature, no entry!EBI support page.And one more thing…..

Slide65

The InterPro Team:

Amaia Sangrador

Craig

McAnulla

Matthew

Fraser

Maxim ScheremetjewSiew-Yit YongAlex MitchellSebastien Pesseat

SarahHunterGiftNukaHsin-YuChangwww.ebi.ac.uk/interproTwitter: @InterProDB

Slide66

Database

Basis

Institution

Built from

Focus

URLPfamHMMSanger InstituteSequence alignment

Family & Domain based on conserved sequencehttp://pfam.sanger.ac.uk/Gene3DHMMUCLStructure alignmentStructural Domainhttp://gene3d.biochem.ucl.ac.uk/Gene3D/SuperfamilyHMM

Uni. of BristolStructure alignmentEvolutionary domain relationshipshttp://supfam.cs.bris.ac.uk/SUPERFAMILY/SMARTHMMEMBL HeidelbergSequence alignmentFunctional domain annotationhttp://smart.embl-heidelberg.de/

TIGRFAM

HMM

J. Craig Venter Inst.

Sequence alignment

Microbial Functional Family Classification

http://www.jcvi.org/cms/research/projects/tigrfams/overview/

Panther

HMM

Uni. S. California

Sequence alignment

Family functional classification

http://www.pantherdb.org/

PIRSF

HMM

PIR, Georgetown, Washington D.C.

Sequence alignment

Functional classification

http://pir.georgetown.edu/pirwww/dbinfo/pirsf.shtml

PRINTS

Fingerprints

Uni. of Manchester

Sequence alignment

Family functional classification

http://www.bioinf.manchester.ac.uk/dbbrowser/PRINTS/index.php

PROSITE

Patterns & Profiles

SIB

Sequence alignment

Functional annotation

http://expasy.org/prosite/

HAMAP

Profiles

SIB

Sequence alignment

Microbial protein family classification

http://expasy.org/sprot/hamap/

ProDom

Sequence clustering

PRABI :

Rhône-Alpes Bioinformatics Center

Sequence alignment

Conserved domain prediction

http://prodom.prabi.fr/prodom/current/html/home.php

Slide67

Thank you!

www.ebi.ac.uk

Twitter: @

emblebi

Facebook: EMBLEBI

YouTube: EMBLMedia

Slide68

The

BLOSUM

(

BLO

cks

SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences.

Slide69

The

BLOSUM

(

BLO

cks

SUbstitution Matrix) matrix is a substitution matrix used for sequence alignment of proteins. BLOSUM matrices are used to score alignments between evolutionarily divergent protein sequences.