nd April 2012 Ansuman Chattopadhyay PhD Head Molecular Biology Information Service Health Sciences Library System University of Pittsburgh ansumanpittedu httpwwwhslspitteduguidesgenetics ID: 672996
Download Presentation The PPT/PDF document "Protein Analysis Tools 2" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Protein Analysis Tools2nd April, 2012
Ansuman Chattopadhyay, PhD, Head Molecular Biology Information ServiceHealth Sciences Library SystemUniversity of Pittsburghansuman@pitt.eduhttp://www.hsls.pitt.edu/guides/genetics Slide2
What we’ll do:Brief overview of CLC Main Workbenchfind genomic context of a protein sequencesearch for the presence of conserved domainscreate a multiple sequence alignment plotSlide3
What we’ll do:analyze primary structure such as, hydrophobicity, hydrophylicity, antigenicity
, repeat sequence detection etc. predict secondary structurepredict post translational modification such as, Phosphorylation, glycosylation, ….search for interacting partnerspredict domain driven
protein-protein interactionsSlide4
Workshop Resources
http://www.hsls.pitt.edu/molbio/tutorialsSlide5
HSLS MolBio VideosSlide6
Sequence Analysis Software SuitsWisconsin GCGVectorNTIDNA STAR-LaserGeneGeneiousCLC MainSlide7
Why CLC Main ?WindowsMacLinuxDNA, RNA, Protein, Microarray Data AnalysisRegular UpdateHSLS LicensedSlide8
CLC Main AccessHSLS CLC Main RegistrationLink: http://www.hsls.pitt.edu/molbio/clcmainAccess via Pitt - Network ConnectInstruction video: http://goo.gl/JNjMtSlide9
CLC Main Workbench OverviewGraphical Users InterfaceProtein sequences ImportSequence NavigationSlide10
CLC Main Graphical User Interface (GUI)Slide11
CLC MainSlide12
Navigate a proteinsequenceSlide13
CLC Main –getting started (basic navigation steps): http://media.hsls.pitt.edu/media/molbiovideos/clc-navigation-ac0312.swfCLC Main Workbench Walkthrough (Part1): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part1-ac0112.swfCLC Main Workbench Walkthrough (Part2): http://media.hsls.pitt.edu/media/molbiovideos/clcmain-walkthrough-part2-ac0112.swf
VideosSlide14
Import a Protein SequenceSlide15
Protein SequenceHuman PLCg1Refseq no: NP_002651Uniprot Accession Number: P19174FASTA fileRaw sequenceCLC features:
Search, Import, Create new sequenceSlide16
Import a DNA /Protein sequence into CLC Main (Part1):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part1-ac0112.swfImport a DNA /Protein sequence into CLC Main (Part 2):http://media.hsls.pitt.edu/media/molbiovideos/clc-import-part2-ac0112.swf
VideosSlide17
CLC protein sequenceSlide18
Protein sequence manipulationCreate a new protein with PLCg1 SH2-SH2-SH3 domainsSlide19
Sequence Alignment Pair-wise AlignmentGlobalLocal
Multiple Sequence AlignmentSlide20
Sequence AlignmentSlide21
Pair-wise Sequence AlignmentSlide22
Multiple Sequence AlignmentSlide23
Multiple Sequence AlignmentTools: ClustalW and T-coffeeSlide24
PLCg1 Orthologous sequencesPLCg1:Mouse: NP_067255Rat: NP_037319Cow: NP_776850Dog: XP_542998Zebra fish: NP_919388
Human: NP_002651NP_067255,NP_037319,NP_776850,XP_542998,NP_919388,NP_002651Slide25
Create a multiple sequence alignment plot using CLC(part1):http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212 part1.swfCreate a multiple sequence alignment plot using CLC (part2):http://media.hsls.pitt.edu/media/molbiovideos/msf-clcmain-ac0212-part2.swfCreate a multiple sequence alignment plot: http://media.hsls.pitt.edu/media/clres2705/msa.swf
Compare two peptide sequences.: http://media.hsls.pitt.edu/media/clres2705/blast2.swfVideosSlide26
Starting with a short peptide sequence find:the whole protein sequenceorthologs in other species (nematode)Tool:UCSC BLATNCBI BLAST against SwissProtSlide27
Peptide to whole proteinPeptide seq: SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPRSlide28
Place a mRNA or peptide sequence into the human genome (BLAT):http://www.hsls.pitt.edu/molbio/videos/play?v=12eFind homologous sequences: http://media.hsls.pitt.edu/media/clres2705/blast.swf
VideosSlide29
Find homologous sequence
SPEGCWGPEPRDCVSCRNVSRGRECVDKCNLLEGEPRSlide30
Sequence Manipulation & Format ConversionSequence Manipulation Suitehttp://bioinformatics.org/sms2/Readseqhttp://thr.cit.nih.gov/molbio/readseq/
GenePept
FASTASlide31
Hands-OnRetrieve amino acid sequence present between position 25 to 45 in Sequence A (MS Word Doc)Identify the rat gene which encodes this peptide fragment and retrieve its whole protein sequenceFind the fruit fly
homolog of this protein.What % identity the fruit fly protein shares with its rat homolog?Predict potential MAPK phosphorylation sites present in the fruit fly proteinSlide32
Protein Domain Search: InterPro ScanInterPro is a database of protein families, domains, regions, repeats and sites in which identifiable features found in known proteins can be applied to new protein sequences.
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide33
Videos:Find protein domains, PTM, secondary str etc: http://media.hsls.pitt.edu/media/clres2705/uniprot.swf Start with a protein pattern and find what proteins posses that domain: http://media.hsls.pitt.edu/media/clres2705/scanprosite.swf
Search for protein domains,repeats and sites: http://media.hsls.pitt.edu/media/clres2705/interpro.swfSlide34
Protein Domain Search: ScanProsite>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha
isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide35
Pattern Search [AC]-x-V-x(4)-{ED}:This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}F-[GSTV]-P-R-L-[G>]Slide36
Pattern SearchSlide37
Protein Primary Structure AnalysisTool: ExPASy from SIBCalculated Mol WtTheoritical PIExtinction coefficients
Estimated half-lifeHydropathicity plot : Kyte & Doolittle Hydrophilicity plot: Hopp T.P., Woods K.RSlide38
Antigenic Site PredictionTool: Emboss Antigenic
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide39
EmBoss AntigenicAntigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar.Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues
Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. A semi-empirical method which makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes was developed by Kolaskar and Tongaonkar to predict antigenic determinants on proteins. Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. This method is based on a single parameter and thus very simple to use.Slide40
Transmembrane Region predictionSlide41
Transmembrane Site PredictionTool: TMHMM Server
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide42
Protein Secondary Structure
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha
isoform
MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide43
Protein-Protein Interactions PredictionTool: STRING
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide44
Hands-onTake the human BCL2 protein sequence and Find its domain architecturePredict the topology of its transmembrane regionDesign suitable antigenic site for antibody generationWhat is its calculated Mol Wt and Ext Coefficient?
Predict its secondary structureWhat % of this protein possesses alpha helical structure?Predict its potential interacting partnersSlide45
Hands-onPrediction of potential phosphorylation sites present in a protein sequence.Sequence: human BCL2>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHKSlide46
Phosphorylation Site Prediction:
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Tool:
NetPhosSlide47
Phosphorylation Site Prediction:
>gi|72198189|ref|NP_000624.2| B-cell lymphoma protein 2 alpha isoform MAHAGRTGYDNREIVMKYIHYKLSQRGYEWDAGDVGAAPPGAAPAPGIFSSQPGHTPHPAASRDPVARTSPLQTPAAPGAAAGPALSPVPPVVHLTLRQAGDDFSRRYRRDFAEMSSQLHLTPFTARGRFATVVEELFRD GVNWGRIVAFFEFGGVMCVESVNREMSPLVDNIALWMTEYLNRHLHTWIQDNGGWDAFVELYGPSMRPLFDFSWLSLKTLLSLALVGACITLGAYLGHK
Tool: GPSSlide48
Thank you!Any questions?Carrie Iwema Ansuman Chattopadhyay
iwema@pitt.edu ansuman@pitt.edu 412-383-6887 412-648-1297http://www.hsls.pitt.edu/guides/genetics