Sequence Searching and Alignments External Services Sequence searching and alignments Andrew Cowley 08052012 2 Andrew Cowley Bioinformatics Trainer Hamish McWilliam Software engineer Rodrigo Lopez ID: 912339
Download Presentation The PPT/PDF document "Andrew Cowley External Services, EMBL-EB..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Andrew CowleyExternal Services, EMBL-EBI
Sequence Searching and Alignments
Slide2External ServicesSequence searching and alignments - Andrew
Cowley
08/05/2012
2
Andrew Cowley
Bioinformatics Trainer
Hamish McWilliam
Software engineer
Rodrigo Lopez
Head of External Services
+ many others!
Slide3Contents
Sequence databases
Database browsing tools
Similarity searching and alignments
Alignment basics
Similarity searching toolsMore advanced tools
Alignment toolsGuidelinesProblem sequences
Sequence searching and alignments - Andrew Cowley
08/05/2012
3
Slide4MaterialsSequence searching and alignments - Andrew Cowley
08/05/2012
4
www.ebi.ac.uk/~apc/Courses/Rotterdam
Slide5DataSimplistically, much of the data at the EBI can be thought of as a container
One part being the raw data (eg. Sequence)
Another part being annotation on this data
Sequence searching and alignments - Andrew Cowley
08/05/2012
5
Slide6ExampleSequence searching and alignments - Andrew Cowley
08/05/2012
6
ID AJ131285; SV 1; linear; mRNA; STD; INV; 919 BP.
XX
AC AJ131285;
XX
DT 24-APR-2001 (Rel. 67, Created)
DT 20-JUL-2001 (Rel. 68, Last updated, Version 4)
XX
DE Sabella
spallanzanii mRNA for globin
3
XX
KW globin; globin 3; globin gene.
XX
OS
Sabella
spallanzanii
OC
Eukaryota
;
Metazoa
;
Annelida
;
Polychaeta
;
Palpata
;
Canalipalpata
;
OC
Sabellida
;
Sabellidae
;
Sabella
.
XX
RN [1]
RP 1-919
RA
Negrisolo
E.M.;
RT ;
RL Submitted (11-DEC-1998) to the EMBL/
GenBank
/DDBJ databases.
RL Negrisolo E.M., Biologia, Universita degli Studi di Padova, via U. Bassi
RL 58/B, Padova,35131, ITALY.
FH Key Location/Qualifiers
FH
FT source 1..919
FT /organism="
Sabella
spallanzanii
"
FT /
mol_type
="mRNA"
FT /
db_xref
="taxon:85702"
FT CDS 73..552
FT /gene="
globin
"
FT /product="
globin
3"
FT /function="respiratory pigment"
FT /
db_xref
="GOA:Q9BHK1"
FT /
db_xref
="InterPro:IPR000971"
FT /
db_xref
="InterPro:IPR014610"
FT /
db_xref
="
UniProtKB
/TrEMBL:Q9BHK1"
FT /experiment="experimental evidence, no additional details
FT recorded"
FT /
protein_id
="CAC37412.1"
FT /translation="MYKWLLCLALIGCVSGCNILQRLKVKNQWQEAFGYADDRTSXGTA
FT LWRSIIMQKPESVDKFFKRVNGKDISSPAFQAHIQRVFGGFDMCISMLDDSDVLASQLA
FT HLHAQHVERGISAEYFDVFAESLMLAVESTIESCFDKDAWSQCTKVISSGIGSGV"
XX
SQ Sequence 919 BP; 244 A; 246 C; 199 G; 225 T; 5 other;
caaacagtca
rttaattcac
agagccctga
ggtctctcgc
tcctttctgc
gtcactctct
60
cttaccgtca
tcatgtacaa
gtggttgctt
tgcctggctc
tgattggctg
cgtcagcggc
120
tgcaacatcc
tccagaggct
gaaggtcaag
aaccagtggc
aggaggcttt
cggctatgct
180
gacgacagga
catcccycgg
taccgcattg
tggagatcca
tcatcatgca
gaagcccgag
240
//
Slide7Data - Nucleotide
ENA/EMBL-Bank:
Release and updates
Divided into classes and divisions
Supplementary sets: EMBL-CDS, EMBL-MGA
Specialist data sets, e.g.:
Immunoglobulins
: IMGT/HLA, IMGT/LIGM, etc.
Alternative splicing: ASD, ASTD, etc.
Completed genomes:
Ensembl
, Integr8, etc.
Variation:
HGVBase
,
dbSNP
, etc.
Sequence searching and alignments - Andrew Cowley
08/05/2012
7
Slide8Nucleotides: European Nucleotide Archive (ENA)
08/05/2012
8
Figure adapted from: Cochrane, G.
et al
. Public Data Resources as the Foundation for a Worldwide Metagenomics Data Infrastructure. In:
Metagenomics: Theory, Methods and Applications
(Chapter 5), Caister Academic Press, Universidad Nacional de Cordoba, Argentina. Ed. D. Marco (2010).
The ENA has a three-tiered data architecture.
It consolidates information from
EMBL-Bank
, the
European Trace Archive
(containing raw data from electrophoresis-based sequencing machines) and the
Sequence Read Archive
(containing raw data from next-generation sequencing platforms).
Sequence searching and alignments - Andrew Cowley
Slide908/05/2012
9
Keyword and sequence searching
Map-based search of environmental samples
Downloads
Nucleotides: EMBL-Bank
EMBL-Bank
DDBJ
GenBank
www.insdc.org
Direct submissions
Patents
Genome-sequencing projects
Updates
Third-party annotation
Sequence searching and alignments - Andrew Cowley
Slide10EMBL-Bank: Classes
CON – Constructed Sequence
EST – Expressed Sequence Tag
GSS – Genome Survey Sequence
HTC – High Throughput cDNA
HTG – High Throughput Genome
PAT – Patent
STS – Sequence Tagged Site
STD – Standard
TPA – Third Party Annotation
TSA – Transcriptome Shotgun Assembly (from r95)
WGS – Whole Genome Shotgun
Sequence searching and alignments - Andrew Cowley
08/05/2012
10
Slide11EMBL-Bank: Taxonomic Divisions
ENV - Environmental
FUN - Fungi
INV - Invertebrate
HUM - Human
MAM – Mammal (excluding human, mouse and rodent)
MUS - Mouse
PHG - Phage
PLN - Plant
PRO - Prokaryote
ROD – Rodent (excluding mouse)
SYN - Synthetic
TGN – Transgenic
UNC - Unclassified
VRL - Viral
VRT – Vertebrate (excluding human, mammal, mouse and rodent)
Sequence searching and alignments - Andrew Cowley
08/05/2012
11
Slide12Data – Protein Sequence
UniProt
databases:
UniProtKB
: human
curated
and automatic translation sections
UniRef
: non-redundant sequence clusters
UniParc
: non-identical sequence archive
Sequence from structures:
PDB
SGT
Specialist data sets, e.g.:
Immunoglobulins
: IMGT/HLA
Alternative splicing: ASD, ASTD
Completed proteomes:
Ensembl
, Integr8
Protein Interactions:
IntAct
Patent Proteins: EPO, JPO, KIPO and USPTO
Sequence searching and alignments - Andrew Cowley
08/05/2012
12
Slide1308/05/2012
13
Protein sequence: UniProt
UniProt
Manual curation
Literature-based annotation
Sequence analysis
Automated annotation
PRIDE
GO
InterPro
IntAct
IntEnz
HAMAP
RESID
Functional info
Protein identification data
Protein families and domains
Molecular interactions
Enzymes
Microbial protein families
Post-translational modifications
Some data sources for annotation
Transmembrane prediction
InterPro classification
Signal prediction
Other predictions
Protein
classification
Sequence searching and alignments - Andrew Cowley
Slide14Databases
Many databases and they are getting bigger
Efficient searching involves knowledge of what is stored in these
Not everything in the databases is correct
Changes can happen...
Deletions, sequence modifications
Daily updates, identifier changes, etc.
Sequence searching and alignments - Andrew Cowley
08/05/2012
14
Slide15Searching databases
Sequence searching and alignments - Andrew Cowley
08/05/2012
15
Slide16SearchingMany ways of searching databases
Annotation/title
Know something about your sequence
Gene name
FunctionAccession
Sequence searching and alignments - Andrew Cowley
08/05/2012
16
Slide17EBI Search
Sequence searching and alignments - Andrew Cowley
08/05/2012
17
Slide18EBI SearchSequence searching and alignments - Andrew Cowley
08/05/2012
18
Slide19Database webpagesSequence searching and alignments - Andrew Cowley
08/05/2012
19
Slide20Database searchingSequence searching and alignments - Andrew Cowley
08/05/2012
20
Slide21SearchingMany ways of searching databases
Annotation/title
Know something about your sequence
Gene name
FunctionAccession
Raw dataDon’t know!Or want to check...Infer extra informationHomology?
Annotation?Function?
Sequence searching and alignments - Andrew Cowley
08/05/2012
21
Slide22Sequence alignmentRelatively easy if we have an exact match
.. But sequence is variable
Between individuals, species, location etc.
That variability is useful data too!
Need a search method that allows for some variabilityAnd even better – helps us assess that variability
Sequence searching and alignments - Andrew Cowley
08/05/2012
22
Slide23Sequence alignmentSequence searching and alignments - Andrew Cowley
08/05/2012
23
ACATAGGT
TCATAGAT
AAATTCTG
Query:
1
2
Slide24Sequence alignmentSequence searching and alignments - Andrew Cowley
08/05/2012
24
ACATAGGT
TCATAGAT
AAATTCTG
Query:
1
2
ACATAGGT
ACATAGGT
Slide25Sequence alignmentSequence searching and alignments - Andrew Cowley
08/05/2012
25
ACATAGGT
T
CATAG
A
T
A
AATTCTG
Query:
1
2
Score:
6/8
3/8
A
CATAG
G
T
A
C
AT
AGGT
Slide26Sequence alignmentSequence searching and alignments - Andrew Cowley
08/05/2012
26
atttcacagaggaggacaaggctactatcacaagcctgtggggcaaggtgaatgtggaag
atgctggaggagaaaccctgggaaggctcctggttgtctacccatggacccagaggttct
ttgacagctttggcaacctgtcctctgcctctgccatcatgggcaaccccaaagtcaagg
cacatggcaagaaggtgctgacttccttgggagatgccattaaagcacctgggatgatct
caagggcacctttgcccagcttgagt
atggtgctctctgcagctgacaaaaccaacatcaagaactgctgggggaagattggtggc
catggtggtgaatatggcgaggaggccctacagaggatgttcgctgccttccccaccaccaagacctacttctctcacattgatgtaagccccggctctgcccaggtcaaggctcacggcaagaaggttgctgatgccctggccaaagctgcagaccacgtcgaagacctgcctggtgccctgtccactctgagcgacctgc
cacaagcctgtggggcaaggtgaatgtggaagatgctggaggagaaaccctgggaaggctcctggttgtntacccatggacccagaggttctttgacagctttggcaacctgtcctctgcctctgccatcatgggcaaccccaaagtcaaggcacatggcaagaaggtgctgacttcctt
gggagatgccataaagcacctggatgatctcaagggcaQuery:
1
2
Slide27Dot plotMaybe a dot plot will help
ENA, sequence alignments and similarity searching - Andrew Cowley
08/05/2012
27
Query
Sequence 1
A C A T A G
GATACT
Slide28Dot plotSequence searching and alignments - Andrew Cowley
08/05/2012
28
Query vs Sequence 1
Query vs Sequence 2
Query
Query
1
2
Slide29We can see the difference, but how to turn that into something a computer can evaluate?
Computers rely on algorithms which give them a score
They can then compare scores
Sequence searching and alignments - Andrew Cowley
08/05/2012
29
Slide30Simple algorithm – penalise movement away from diagonal – gap penalty
Sequence searching and alignments - Andrew Cowley
08/05/2012
30
0
-10
-10
0
-10
-10
Slide31Gap extend penalty?Single block of insertions/deletions is more likely than multiple in/del events
Sequence searching and alignments - Andrew Cowley
08/05/2012
31
NVELKAET
NVDEATNFELKAET
NV-ELKAET
NVDE--A-TNFELKAET
NV------ELKAET
NV
DEATNF
ELKAET
Slide32To encourage this we apply a low penalty per each gap, and a high one just to open a gap.
-10.5
Gap extend
Sequence searching and alignments - Andrew Cowley
08/05/2012
32
0
-10.5
-10.5
0
-10
-0.5
-10
-0.5
-11
0
-10.5
-0.5
-11
-0.5
-10.5
-10.5
Gap open = 10
Gap extend = 0.5
Slide33Match/mismatchOf course, we need to tell the algorithm that matching letters are better than mismatches too
This is done via a scoring matrix
Sequence searching and alignments - Andrew Cowley
08/05/2012
33
A C G T
A
C
G
T
5 -4 -4 -4-4
5 -4 -4-4 -4
5
-4
-4 -4 -4
5
Slide34Putting the two together gives us a scoring mechanism
Sequence searching and alignments - Andrew Cowley
08/05/2012
34
-4
-18.5
-18.5
1
-14
-13.5
-23
-13.5
T
A
C
A
C
A
6
Slide35To pick the optimal alignment, start at the end and trace back the highest scoring route.
Sequence searching and alignments - Andrew Cowley
08/05/2012
35
-4
-18.5
-18.5
1
-14
-13.5
-23
-13.5
T
A
C
A
C
A
6
Slide36Needleman-WunschCongratulations! You’ve just reconstructed the Needleman-Wunsch algorithm!
An example of
dynamic programming
Comparing the full length of both sequences is called a
global-global
or just global alignment
Sequence searching and alignments - Andrew Cowley
08/05/201236
Slide37Global vs LocalBut global-global might not be suitable for sequences that are very different lengths
A modified form of this algorithm for local alignment is called the
Smith-Waterman
algorithm.
Sets negative scores in matrix to 0, and allows trace back to end and restart
Sequence searching and alignments - Andrew Cowley
08/05/2012
37
Slide38Global vs Local
Sequence searching and alignments - Andrew Cowley
08/05/2012
38
A T G T A T A C G C
A G T A T A - G C
A - T G T A T A C G C
A G T A T A - - - G C
Slide39ScoringParameters so far:
Match/mismatch
Gap opening
Gap extending
Can we improve it?
Sequence searching and alignments - Andrew Cowley
08/05/2012
39
Slide40SubstitutionsSome substitutions are more likely than others
DNA:
Purines (A,G) – dual ring
Pyrimidines (C, T) – single ring
Substitutions of the same type are called transitions, where as exchanging one for another is called a
transversionTransistions occur more frequently than transversions, so we can score them higher in the scoring matrix
Sequence searching and alignments - Andrew Cowley
08/05/2012
40
Slide41Sequence searching and alignments - Andrew Cowley
08/05/2012
41
Slide42ProteinsWhat about proteins?
Sequence searching and alignments - Andrew Cowley
08/05/2012
42
Slide43Protein substitution matricesCan look at closely related proteins to determine substitution rates
Two most commonly used models:
BLOSUM
PAM
Sequence searching and alignments - Andrew Cowley
08/05/2012
43
Slide44BLOSUM
Blo
cks of Amino Acid
Su
bstitution MatrixAlign conserved regions of evolutionary divergent sequences clustered at a given % identity
Count relative frequencies of amino acids and substitution probabilityTurn that into a matrix where the more positive a substitution is, the more likely is it to be found, and the more negative, the less likely.
Higher BLOSUM number = more closely related
Sequence searching and alignments - Andrew Cowley
08/05/2012
44
Slide45PAMP
oint
A
ccepted
MutationObserved mutations in a set of closely related proteins
Markov chain model created to describe substitutionsNormalised so that PAM1 = 1 mutation per 100 amino acidsExtrapolate matrices from modelHigher PAM number = less closely related
Sequence searching and alignments - Andrew Cowley
08/05/2012
45
PAM 250
Slide46Effect of applying PAM10 -> 500 matrices to the human LDL receptor sequence
Sequence searching and alignments - Andrew Cowley
08/05/2012
46
10
100
200
400
500
300
Slide47Sequence searching and alignments - Andrew Cowley
08/05/2012
47
BLOSUM 45
PAM 250
BLOSUM 62
PAM 160
BLOSUM 90
PAM 100
More divergent
Less divergent
Slide48ScoringParameters:Match/mismatch
Gap opening
Gap extending
Substitution matrix
Sequence searching and alignments - Andrew Cowley
08/05/2012
48
Slide49Dynamic programming alignments at the EBIEMBOSS Pairwise Alignment Algorithms
European Molecular Biology Open Software Suite
Suite of useful tools for molecular biology
Command line based
Designed to be used as part of scripts/chained programsWe implement selected tools to provide web-based access
Sequence searching and alignments - Andrew Cowley
08/05/2012
49
Slide50Where to find at the EBI?Sequence searching and alignments - Andrew Cowley
08/05/2012
50
http://www.ebi.ac.uk/Tools/psa
Or...
Slide51Where to find at the EBI?Sequence searching and alignments - Andrew Cowley
08/05/2012
51
Slide52Pairwise alignment tools
Global alignment
Local alignment
Sequence searching and alignments - Andrew Cowley
08/05/2012
52
Needle
Water
Stretcher
Matcher
LALIGN
Slide53Sequence searching and alignments - Andrew Cowley
08/05/2012
53
Submit!
Parameters
Sequence input
Change to nucleotide
Slide54Sequence searching and alignments - Andrew Cowley
08/05/2012
54
Slide55Sequence searching and alignments - Andrew Cowley
08/05/2012
55
Key
-
Gap
: Positive match
. Negative match
| Identity
Slide56Example sequencesSequence searching and alignments - Andrew Cowley
08/05/2012
56
www.ebi.ac.uk/~apc/Courses/Rotterdam
Pairwise_align1.fsa
Pairwise_align2.fsa
Slide57Dynamic programming sequence search methods at the EBIGlobal alignment
Local alignment
Global query vs local database
Sequence searching and alignments - Andrew Cowley
08/05/2012
57
GGSEARCH
SSEARCH
GLSEARCH
Slide58Where to find at the EBI?Sequence searching and alignments - Andrew Cowley
08/05/2012
58
www.ebi.ac.uk/Tools/sss/
Or...
Slide59Where to find at the EBI?
Sequence searching and alignments - Andrew Cowley
08/05/2012
59
Slide60Similarity search
Sequence searching and alignments - Andrew Cowley
08/05/2012
60
Database selection
Sequence input
Parameters
Submit!
Slide61Dynamic programming methods are
rigorous
and guarantee an
optimal
resultBut take up a lot of memoryAnd evaluate each position of the matrix
Predictably, this makes them slow and demanding when you are aligning large sequences
Sequence searching and alignments - Andrew Cowley08/05/2012
61
Slide62HeuristicsTherefore we need methods of estimating alignments
Estimation methods are called
heuristics
Try and take short cuts in an intelligent manner
Speed up the searchAt the possible expense of accuracy
Accuracy in sequence searches is important for:Aligning the right bitsScoring the alignment correctlyIdentifying similar sequences -
sensitivitySequence searching and alignments - Andrew Cowley
08/05/2012
62
Slide63Going back to our dot plot
Sequence searching and alignments - Andrew Cowley
08/05/2012
63
Slide64Instead of searching the whole matrix, if we narrow the search space down to a likely region we will improve the speed.
Sequence searching and alignments - Andrew Cowley
08/05/2012
64
Slide65Of course, we have to identify likely regions – not all alignments will be as nice as that one!
This is the method used by
FASTA
W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
Sequence searching and alignments - Andrew Cowley
08/05/2012
65
Slide66FASTA – step 1Identify runs of identical sequence and pick regions with highest density of runs
Sequence searching and alignments - Andrew Cowley
08/05/2012
66
Ktup
parameter:
How small are ‘words’ considered before they are ignored
Increase
Ktup
= faster, but less sensitive
Slide67FASTA – step 2Weight scoring of runs using matrix, trim back regions to those contributing to highest scores
Sequence searching and alignments - Andrew Cowley
08/05/2012
67
Parameter:
Substitution matrix
Slide68FASTA – step 3Discard regions too far from the highest scoring region
Sequence searching and alignments - Andrew Cowley
08/05/2012
68
Joining threshold:
Internally determined
Slide69FASTA – step 4Use dynamic programming to optimise alignment in a narrow band encompassing the top scoring regions
Sequence searching and alignments - Andrew Cowley
08/05/2012
69
Parameters:
Gap open
Gap extend
Substitution matrix
Slide70FASTARepeat against all sequences in the database
Sequence searching and alignments - Andrew Cowley
08/05/2012
70
Slide71FASTA – programs available at EBI
FASTA:
”a fast approximation to Smith & Waterman”
FASTA – scan a protein or DNA sequence library for similar sequences.
FASTX/Y – compare a DNA sequence to a protein sequence databases, comparing the translated DNA sequence in forward or reverse translation frames.
TFASTX/Y – compare a protein sequence to a translated DNA data bank.
FASTF – compares ordered peptides (
Edman
degradation) to a protein databank.
FASTS – compares unordered peptides (Mass Spec.) to a protein databank.
SSEARCH – Rigorous scan of protein or DNA sequence library (S&W Algorithm).
Sequence searching and alignments - Andrew Cowley
08/05/2012
71
Slide72Where to find at the EBI?Sequence searching and alignments - Andrew Cowley
08/05/2012
72
www.ebi.ac.uk/Tools/sss/
Or...
Slide73Where to find at the EBI?
Sequence searching and alignments - Andrew Cowley
08/05/2012
73
Slide74Sequence searching and alignments - Andrew Cowley
08/05/2012
74
Slide75Similarity search
Sequence searching and alignments - Andrew Cowley
08/05/2012
75
Database selection
Sequence input
Parameters
Submit!
Slide76Example sequenceSequence searching and alignments - Andrew Cowley
08/05/2012
76
www.ebi.ac.uk/~apc/Courses/Rotterdam
test_prot.fasta
Slide77FASTA - resultsSequence searching and alignments - Andrew Cowley
08/05/2012
77
Slide78FASTA - resultsSequence searching and alignments - Andrew Cowley
08/05/2012
78
Slide79FASTA - resultsSequence searching and alignments - Andrew Cowley
08/05/2012
79
Slide80FASTA - resultsSequence searching and alignments - Andrew Cowley
08/05/2012
80
Key
-
Gap
: Identity
. Similarity
X Filtered
Slide81BLAST – Basic Local Alignment Search Tool
Instead of narrowing the dynamic programming search space, BLAST works a different way
Firstly, it creates a word list both of the exact sequence and high scoring substitutions
Sequence searching and alignments - Andrew Cowley
08/05/2012
81
Altschul et al (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
Slide82BLAST – step 1w=3
Sequence searching and alignments - Andrew Cowley
08/05/2012
82
SEWRFKHIYRGQPRRHLLTTGWSTFVT
SEW
EWR
WRF
Parameter:
Word length (w)
Increase = faster, but less sensitive
Slide83BLAST – step 1(cont.d)
w=3
T=13
Sequence searching and alignments - Andrew Cowley
08/05/2012
83
SEWRFKHIYR
GQP
RRHLLTTGWSTFVT
GQP 18
GEP 15GRP 14GKP 14GNP 13GDP 13
AQP 12NQP 12
Parameters:
Neighbourhood threshold (T)
Substitution matrix
Slide84BLAST – step 2Then it scans database sequences for exact matches with these words
Sequence searching and alignments - Andrew Cowley
08/05/2012
84
Slide85If two hits are found on the same diagonal the alignment is extended until the score drops by a certain amountThis results in a High-scoring Segment Pair (HSP)
BLAST – step 3
Sequence searching and alignments - Andrew Cowley
08/05/2012
85
Parameters:
Drop off
Substitution matrix
Slide86If the total HSP score is above another threshold then a gapped extension is initiated
BLAST – step 4
Sequence searching and alignments - Andrew Cowley
08/05/2012
86
Parameters:
Extension threshold (
Sg
)
Substitution matrix
Slide87BLASTThe steps rule out many database sequences early on
Large increase in speed
Sequence searching and alignments - Andrew Cowley
08/05/2012
87
Slide88BLAST – programs available at the EBIBasic Local Alignment Search Tool
NCBI-BLAST programs:
BLASTP – protein sequence vs. protein sequence library
BLASTN – nucleotide query vs. nucleotide database
BLASTX – translated DNA vs. protein sequence library
WU-BLAST programs:
BLASTP – protein query vs. protein database
BLASTN – nucleotide query vs. nucleotide database
BLASTX – translated nucleotide query vs. protein database
TBLASTN – protein query vs. translated nucleotide database
TBLASTX – translated nucleotide query vs. translated nucleotide database
Sequence searching and alignments - Andrew Cowley
08/05/2012
88
Combines several parameters into ‘sensitivity’ option
Slide89Sequence searching and alignments - Andrew Cowley
08/05/2012
89
Slide90Example sequenceSequence searching and alignments - Andrew Cowley
08/05/2012
90
www.ebi.ac.uk/~apc/Courses/Rotterdam
test_prot.fasta
Slide91Sequence searching and alignments - Andrew Cowley
08/05/2012
91
Key
-
Gap
[residue] Identity
+ Similarity
X Filtered
Slide92Differences between BLAST and FASTA
BLAST
Fast
Good with proteins
Produces good local alignments + short global alignments
Produces HSP (reports internal matches in long sequences)
Might miss a potential alignment due to ruling out sequences early on in the process
Good at finding
siblings
FASTA
Not as fast as BLAST
Much better with DNA than BLASTN
Produces S&W alignments
Checks each possible alignment with database sequences
Good at finding
cousins
Sequence searching and alignments - Andrew Cowley
08/05/2012
92
Slide93When to use what?Sequence searching and alignments - Andrew Cowley
08/05/2012
93
Database size
Query length
FASTA
WU-BLAST
NCBI BLAST
PSI-SEARCH
Slide94Sequence searching and alignments - Andrew Cowley
08/05/2012
94
When to use what?
PDB Swiss-Prot UniRef50 UniRef 90 UniRef100 UniProtKB UniParc
FASTA
WU-BLAST
NCBI BLAST
PSI-SEARCH
time to search
Slide95Homology and SimilaritySequence searching and alignments - Andrew Cowley
08/05/2012
95
Slide96SimilaritySequence searching and alignments - Andrew Cowley
08/05/2012
96
Slide97HomologySequence searching and alignments - Andrew Cowley
08/05/2012
97
Slide98Unrelated!*
Sequence searching and alignments - Andrew Cowley
08/05/2012
98
*OK, very distantly related!
Slide99Homology vs. Similarity
Presence of similar features because of common decent
Cannot be observed since the ancestors are not anymore
Is inferred as a conclusion based on ‘similarity’
Homology is like pregnancy: Either one is or one isn’t! (
Gribskov
– 1999)
Quantifies a ‘likeness’
Uses statistics to determine ‘significance’ of a similarity
Statistically significant similar sequences are considered ‘homologous’
Sequence searching and alignments - Andrew Cowley
08/05/2012
99
Slide100So far, we’ve talked about scoring alignments
Direct function of the algorithm
But what we want is to assign some kind of quality to that score
Sequence searching and alignments - Andrew Cowley
08/05/2012
100
Slide101Score vs significanceSequence searching and alignments - Andrew Cowley
08/05/2012
101
A A A
A A A
A C A T A A G G C T
A T A C A A G C C T
High score
High significance
Slide102“Lies, damn lies, and statistics”
Sequence searching and alignments - Andrew Cowley
08/05/2012
102
Slide103“Lies, damn lies, and statistics”Not just interested in score...
...But how likely we are to get that alignment by chance alone
It is this ‘non-random’ alignment that infers homology
Statistics are used to estimate this chance
Sequence searching and alignments - Andrew Cowley
08/05/2012
103
Slide104E-value‘Expect’ value
Probability of obtaining this score by chance
Best measure of how biologically significant an alignment is
Used for ranking results by default
Sequence searching and alignments - Andrew Cowley
08/05/2012
104
Slide105Calculated in slightly different ways for BLAST and FASTA
Short alignments are more likely to be found by chance so have higher E-values
Affected by parameter values like gap penalties and substitution matrices
BLAST and FASTA both optimised for distant relationships
Sequence searching and alignments - Andrew Cowley
08/05/2012
105
Slide106FASTA statisticsCompares query sequence with every sequence in database
As most of these sequences are unrelated it is possible to use the distribution of scores to assign statistical significance
Sequence searching and alignments - Andrew Cowley
08/05/2012
106
Slide107FASTA - histogram
Sequence searching and alignments - Andrew Cowley
08/05/2012
107
Predicted distribution of scores
Observed distribution of scores
Key
*
=
High scoring region
Slide108BLAST statisticsMain reason for speed is that it doesn’t compare query with lots of other sequences
Therefore it pre-estimates statistical values using a random sequence model
Sequence searching and alignments - Andrew Cowley
08/05/2012
108
“Appears to yield fairly accurate results
”
Slide109Search Guidelines
Slide110Search guidelines 1
Whenever possible, compare at the amino acid level rather than at the nucleotide level (fasta, blastp, etc…)
Then with translated DNA query sequences (fastx, blastx)
Search with DNA vs. DNA as the next resort
And then against translated DNA database sequences (tfastx, tblastx) as the VERY LAST RESORT!
Sequence searching and alignments - Andrew Cowley
08/05/2012
110
Slide111Search guidelines 2
Search the smallest database that is likely to contain the sequence(s) of interest
Use sequence statistics (E()-values) rather than
% identity or % similarity, as your primary criterion for sequence homology
Sequence searching and alignments - Andrew Cowley
08/05/2012
111
Slide112Search guidelines 3
Check that the statistics are likely to be accurate by looking for the highest scoring unrelated sequence
Examine the histograms
Use programs such as prss3 to confirm the expectation values.
Searching with shuffled sequences (use MLE/Shuffle in
fasta
) which should have an E() ~1.0
Sequence searching and alignments - Andrew Cowley
08/05/2012
112
Slide113Sequence searching and alignments - Andrew Cowley
08/05/2012
113
Slide114Search guidelines 4
Sequence searching and alignments - Andrew Cowley
08/05/2012
114
Default parameters are set up for most common queries
Consider searches with different gap penalties and other scoring matrices, especially for short queries/domains
Use shallower matrices and/or more stringent gaps in order to uncover or force out relationships in partial sequences
Use BLOSUM62 instead of BLOSUM50 (or PAM100 instead of PAM250)
Remember to change the gap penalty defaults!
MATRIX open ext.
BLOSUM50 -10 -2
BLOSUM62 -11 -1
BLOSUM80 -16 -4
PAM250 -10 -2
PAM120 -16 -4
Slide115Search guidelines 5
Homology can be reliably inferred from statistically significant similarity
But remember:
Orthologous
sequences have similar functions
Paralogous
sequences can acquire very different functional roles
So further work might be needed to tease out details
Sequence searching and alignments - Andrew Cowley
08/05/2012
115
Slide116Sequence searching and alignments - Andrew Cowley
08/05/2012
116
Slide117Search guidelines 6
Consult motif or fingerprint databases in order to uncover evidence for conservation-critical or functional residues
However, motif identity in the absence of significant sequence similarity is usually occurs by chance
Try to produce multiple sequence alignments in order to examine the relatedness of your sequence data
ClustalW/Omega
MUSCLE
T-Coffee
Kalign
MAFFT
Mview (available from EBI FASTA & BLAST services)
DBCLUSTAL (available from EBI BLAST services)
Sequence searching and alignments - Andrew Cowley
08/05/2012
117
Slide118Advanced
Slide119In general, the more information we can add to an alignment, the better the result
Sequence searching and alignments - Andrew Cowley
08/05/2012
119
Conserved regions
Structural information
Motifs
[R, T or D]-[D, A or Q]-[F, E or A]-A-T-H
Slide120Conserved regionsWe can add a new ‘position’ parameter to the substitution matrix
Sequence searching and alignments - Andrew Cowley
08/05/2012
120
We can even modify a normal search to generate a position specific scoring matrix, or PSSM
Slide121PSI-BLASTPosition Specific Iterative – BLAST:
Takes the result of a normal BLAST
Aligns them and generates profile of conserved positions
Uses this to weight scoring on next iteration
Sequence searching and alignments - Andrew Cowley
08/05/2012
121
Slide122PSI-BLASTBy adding importance to conserved residues we might be able to find more distant sequences
But iterate too far and we might be assigning importance where there is none
Sequence searching and alignments - Andrew Cowley
08/05/2012
122
More sensitive
Slide123PSI-BLAST
Sequence searching and alignments - Andrew Cowley
08/05/2012
123
Slide124PSI-BLAST
Sequence searching and alignments - Andrew Cowley
08/05/2012
124
Slide125PSI-BLASTSequence searching and alignments - Andrew Cowley
08/05/2012
125
Slide126PHI-BLASTPattern Hit Initiated-BLAST
User provides a pattern alongside a protein
Database hits have to contain this pattern, and similarity to rest of sequence
Results can initiate a PSI-BLAST search as well
Sequence searching and alignments - Andrew Cowley
08/05/2012
126
Slide127Problem Sequences
Slide128Short sequences
What about short sequences?
Depends on their nature:
Protein
Use shallow matrices
Reduce word length and/or increase the E() value cut off
DNA
Reduce the word length
Ignore gap penalties (force local alignments only)
Use rigorous methods
But ask what you are trying to do!
Sequence searching and alignments - Andrew Cowley
08/05/2012
128
Slide129Low complexity regionsBiologically irrelevant, but likely to skew alignment scoring
E.g. CA repeats, poly-A tails and Proline rich regions
Sequence searching and alignments - Andrew Cowley
08/05/2012
129
Slide130Sequence searching and alignments - Andrew Cowley
08/05/2012
130
Good Statistics:
The inset shows good correlation
between the observed over expected
numbers of scores.
This is the region of the histogram to
look out for first when evaluating results.
Slide131Sequence searching and alignments - Andrew Cowley
08/05/2012
131
The inset shows bad correlation
between the observed and expected
scores in this search.
The spaces between the = and * symbols
indicate this poor correlation.
One reason for this can be low complexity
regions.
Bad Statistics:
Slide132Low complexity regionsBiologically irrelevant, but likely to skew alignment scoring
E.g. CA repeats, poly-A tails and Proline rich regions
Compensate by filtering sequence so these regions don’t contribute to scoring
Filters: seg, xnu, dust, CENSOR
But check what you are filtering!
Sequence searching and alignments - Andrew Cowley
08/05/2012
132
Slide133Sequence searching and alignments - Andrew Cowley
08/05/2012
133
Inset showing the effect of using a low
complexity filter (seg) and searching
the database using the segment with
highest complexity.
Note that there is now good agreement
between the observed and expected
high score in the search and that the
distance between = and * has been
significantly reduced.
Filtered:
Slide134Example sequenceSequence searching and alignments - Andrew Cowley
08/05/2012
134
www.ebi.ac.uk/~apc/Courses/Rotterdam
Filtertest_seq.fsa
Slide135HOE!Homologous Over-Extension
Possible side effect of iteration based methods
Sequence searching and alignments - Andrew Cowley
08/05/2012
135
Slide136Sequence searching and alignments - Andrew Cowley
08/05/2012
136
Slide137Sequence searching and alignments - Andrew Cowley
08/05/2012
137
Slide138Sequence searching and alignments - Andrew Cowley
08/05/2012
138
Slide139Sequence searching and alignments - Andrew Cowley
08/05/2012
139
Slide140Sequence searching and alignments - Andrew Cowley
08/05/2012
140
Slide141Reducing HOELook for domains in results and manually select sequences that form part of PSSM
Mask boundaries according to initial alignment
Results in improvement of false-positives (selectivity)
Sequence searching and alignments - Andrew Cowley
08/05/2012
141
Slide142PSI-SEARCHSmith-Waterman implementation (SSEARCH)
With iterative position specific scoring
Optional boundary masking to reduce HOE
Sequence searching and alignments - Andrew Cowley
08/05/2012
142
Slide143PSI-SearchSequence searching and alignments - Andrew Cowley
08/05/2012
143
Slide144Vector contaminationYou think you know what your sequence is..
.. But the results are really confusing!
Maybe you have vector contamination
Search against known vectors to check
Sequence searching and alignments - Andrew Cowley
08/05/2012
144
Slide145Vector contaminationSequence searching and alignments - Andrew Cowley
08/05/2012
145
Slide146Example sequencesSequence searching and alignments - Andrew Cowley
08/05/2012
146
www.ebi.ac.uk/~apc/Courses/Rotterdam
vectortest_seq1.fsa
vectortest_seq2.fsa
Slide147Multiple Sequence Alignments
Slide148Uses of MSAFunctional predictionPhylogeny
Structural prediction
Protein analysis
To distinguish between orthology and parology
Sequence searching and alignments - Andrew Cowley
08/05/2012
148
Slide149Ideally, might think to build up multiple alignments through weighted sum of pairs (pairwise scores)
But this is too computationally intensive
And doesn’t make much biological sense
Sequence searching and alignments - Andrew Cowley
08/05/2012
149
Slide150Human beta --------VHLT
PEEKSAVTALWGKV
N–-
VDEVGGEALGRLLVV
YP
WTQR
FFESFGDLST
Horse beta --------VQLS
GEEKAAVLALWDKVN–-
EEEVGGEALGRLLVVYPWTQR
FFDSFGDLSN
Human alpha ---------VLSPADKTNVKAAWGKVGAH
AGEYGAEALERMFLS
FP
TTKT
YFPHF-DLS-
Horse alpha ---------VLS
AADKTNVKAAWSKV
GGH
AGEYGAEALERMFLG
FP
TTKT
YFPHF-DLS-
Whale myoglobin ---------VLS
EGEWQLVLHVWAKV
EAD
VAGHGQDILIRLFKS
HP
ETLE
KFDRFKHLKT
Lamprey globin PIVDTGSVAPLS
AAEKTKIRSAWAPV
YST
YETSGVDILVKFFTS
TP
AAQE
FFPKFKGLTT
Lupin globin --------GALT
ESQAALVKSSWEEF
NAN
IPKHTHRFFILVLEI
APAAKD
LFSFLKGTSE *: : : * . : .: * : * : .
Human beta PDAVMGN
PKVKAHGKKVLGAFSDGL
AHLDN-----L
KGTFATLSEL
H
CD
KLHVD
PENFRL
Horse beta PGAVMGN
PKVKA
H
GKKVLHSFGEGV
HHLDN-----L
KGTFAALSEL
H
CD
KLHVD
PENFRL
Human alpha ----HGS
AQVKG
H
GKKVADALTNAV
AHVDD-----M
PNALSALSDL
H
AH
KLRVD
PVNFKL
Horse alpha ----HGS
AQVKA
H
GKKVGDALTLAV
GHLDD-----L
PGALSNLSDL
H
AH
KLRVD
PVNFKL
Whale myoglobin EAEMKAS
EDLKK
H
GVTVLTALGAIL
KKKGH-----H
EAELKPLAQS
H
AT
KHKIP
IKYLEF
Lamprey globin ADQLKKS
ADVRW
H
AERIINAVNDAV
ASMDDT--EKM
SMKLRDLSGK
H
AK
SFQVD
PQYFKV
Lupin globin VP--QNN
PELQA
H
AGKVFKLVYEAA
IQLQVTGVVVT
DATLKNLGSV
H
VS
KGVAD
-AHFPV
. .:: *. : . : *. * . : .
Human beta
LGNVLVCVLAHH
FGKEFTPPVQA
AYQKVVAGVANALA
HKYH------
Horse beta
LGNVLVVVLARH
FGKDFTPELQA
SYQKVVAGVANALA
HKYH------
Human alpha
LSHCLLVTLAAH
LPAEFTPAVHA
SLDKFLASVSTVLT
SKYR------
Horse alpha
LSHCLLSTLAVH
LPNDFTPAVHA
SLDKFLSSVSTVLT
SKYR------
Whale myoglobin
ISEAIIHVLHSR
HPGDFGADAQG
AMNKALELFRKDIA
AKYKELGYQG
Lamprey globin
LAAVIADTVAAG
---D------A
GFEKLMSMICILLR
SAY-------
Lupin globin
VKEAILKTIKEV
VGAKWSEELNS
AWTIAYDELAIVIK
KEMNDAA---
: : .: . .. . :
Weighted Sums of Pairs: WSP
Sequences Time
2 1 second
3 150 seconds
4 6.25 hours
5 39 days
6 16 years
7 2404 years
Time O(L
N
)
08/05/2012
150
Sequence searching and alignments - Andrew Cowley
Slide151Ideally, might think to build up multiple alignments through weighted sum of pairs (pairwise scores)
But this is too computationally intensive
And doesn’t make much biological sense
So use heuristics and progressive alignment methods
Sequence searching and alignments - Andrew Cowley
08/05/2012
151
Slide152ClustalW
>60,000 citations
Clustal1-Clustal4
1988, Paul Sharp, Dublin
Clustal V 1992
EMBL Heidelberg,
Rainer Fuchs
Alan Bleasby
Clustal W, Clustal X 1994-2005Toby Gibson, EMBL, Heidelberg
Julie Thompson, ICGEB, StrasbourgClustal W and Clustal X 2.0 2006University College Dublin
www.clustal.org
08/05/2012
152
Sequence searching and alignments - Andrew Cowley
Slide153CLUSTALQuick, pairwise alignment of all sequences
Line up pairs, with the most similar first
Sequence searching and alignments - Andrew Cowley
08/05/2012
153
Slide154CLUSTALFix the alignment between pairs and treat as one sequence
Sequence searching and alignments - Andrew Cowley
08/05/2012
154
Slide155CLUSTALAlign your fixed pairs with each other
Sequence searching and alignments - Andrew Cowley
08/05/2012
155
Slide156Note, this is not a phylogram!Only a guide tree for the alignment
Sequence searching and alignments - Andrew Cowley
08/05/2012
156
Slide157ClustalW at the EBISequence searching and alignments - Andrew Cowley
08/05/2012
157
Slide158Sequence searching and alignments - Andrew Cowley
08/05/2012
158
Slide159MSAIntroduction to External Services - Andrew Cowley
08/05/2012
159
Sequence input
Parameters
Submit!
Slide160ClustalWSequence searching and alignments - Andrew Cowley
08/05/2012
160
Slide161ClustalWSequence searching and alignments - Andrew Cowley
08/05/2012
161
Slide162Jalview
Sequence searching and alignments - Andrew Cowley
08/05/2012
162
Slide163ClustalW Advantages
Fast
Not too demanding
Widely used
Fine for most uses
DisadvantagesFixing of early alignmentsPropagate errors
Doesn’t search farLocal minimaCompresses gaps
Sequence searching and alignments - Andrew Cowley
08/05/2012163
Slide164Example sequencesSequence searching and alignments - Andrew Cowley
08/05/2012
164
www.ebi.ac.uk/~apc/Courses/Rotterdam
Prot_MSA.fsa
Slide165Example sequencesSequence searching and alignments - Andrew Cowley
08/05/2012
165
www.ebi.ac.uk/~apc/Courses/Rotterdam
Problem_MSA.fsa
Problem_MSA.fsa
Problem_MSA.fsa
Problem_MSA.fsa
Slide166NEW!: Clustal OmegaCompletely different way of doing things from ClustalW
Two major areas of improvement:
1) Guide tree generation
2) Profile-profile alignments
Sequence searching and alignments - Andrew Cowley
08/05/2012
166
Slide167Clustal Omega – Guide Tree improvementsGuide tree generation is one of the slowest steps
Especially with large numbers of sequence
Clustal Omega uses the embed method to sample range of sequences and represent all sequences as vectors to these samples
Results in better scaling with more sequences
Sequence searching and alignments - Andrew Cowley
08/05/2012
167
Slide168Clustal Omega – Profile-profile alignmentsLike sequence searching, profiles can be used to increase sensitivity
HMMs are a form of profile
Clustal Omega aligns HMMs to HMMs
Sequence searching and alignments - Andrew Cowley
08/05/2012
168
Slide169Clustal OmegaBetter scaling for many sequences
Speed
Accuracy
Better scaling for many computers
More accurate alignmentsBut currently protein only
Sequence searching and alignments - Andrew Cowley
08/05/2012
169
Slide170Other Tools
Sequence searching and alignments - Andrew Cowley
08/05/2012
170
Slide171COFFEE
C
onsistency based
O
bjective
Function For alignm
Ent Evaluation
Maximum Weight Trace (John Kececioglu)Maximise similarity to a LIBRARY of residue pairsNotredame, C., Holm, L. and Higgins, D.G. (1998) COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14: 407-422.
08/05/2012
171
Sequence searching and alignments - Andrew Cowley
Slide172COFFEELibrary of reference pairwise alignments
For your given set of sequences
Objective Function
Evaluates consistency between multiple alignment and the library of pairwise alignments
Use SAGA to optimise this functionWeigh depending on quality of alignment
Sequence searching and alignments - Andrew Cowley
08/05/2012
172
SAGA is another alignment method, using genetic algorithms
Slide173COFFEEMore accurate than ClustalWMuch less prone to problems in early alignment stages
VERY slow!
Sequence searching and alignments - Andrew Cowley
08/05/2012
173
Slide174T-CoffeeTree-based COFFEEHeuristic approach to COFFEE
Gets rid of genetic algorithm portion
Uses progressive alignments
Changes algorithm based on number of sequences
Sequence searching and alignments - Andrew Cowley
08/05/2012
174
Slide175T-CoffeeMuch faster than COFFEEAvoids some of ClustalW’s pitfalls
Can take information from several data sources
Still not that fast
Can be very demanding of memory etc.
Sequence searching and alignments - Andrew Cowley
08/05/2012
175
Slide176Others MUSCLE – Bob Edgar
Iterative/progressive alignment
Fast
Good for big alignments, proteins
MAFFTIterative based Fast Fourier Transform
Fast and accurateGood for huge alignmentsKalignVery fast, local-regions aligning
Good for very large numbers of alignments!Sequence searching and alignments - Andrew Cowley
08/05/2012
176
Slide177Which tool should I use?Input data
2-100 sequences of typical protein length
100-500 sequences
>500 sequences
Small number of unusually long sequences
Recommendation
MUSCLE, T-Coffee, MAFFT, ClustalW/OmegaClustal Omega, MUSCLE, MAFFTClustal Omega, KALIGN
ClustalW, KALIGN
MSA tools - Andrew Cowley
08/05/2012
177
Slide178How to evaluate?Use a benchmark BaliBASE
Sequence searching and alignments - Andrew Cowley
08/05/2012
178
Slide179BaliBASE
Thompson, JD, Plewniak, F. and Poch, O. (1999)
NAR and Bioinformatics
ICGEB Strasbourg
141 manual alignments using structures
5 sections
core alignment regions marked
1. Equidistant
(82)
2. Orphan
(23)
3. Two groups (12)
4. Long internal gaps
(13)
5. Long terminal gaps
(11)
08/05/2012
179
Sequence searching and alignments - Andrew Cowley
Slide180Benchmark pitfallsBenchmark dataset may not be representative
Danger of over-training towards benchmark
Goldman: Most MSAs have unrealistic gaps
Tend towards multiple, independent deletions
Insertions are rareSequences shrink in length over evolution
No supporting evidence that this is the case
Sequence searching and alignments - Andrew Cowley08/05/2012
180
Slide181SolutionsUse phylogentic data to guide alignment
Keep track of changes to ancestor sequences
Don’t change them again so easily in decendents
Sequence searching and alignments - Andrew Cowley
08/05/2012
181
Slide182PRANKProbabilistic Alignment KitwebPRANK
Better suited for closely related sequences
Tied solutions are chosen from at random
Avoids incorrect confidence in result
Means alignments might not be reproducible
Alignments look quite differentMight look worse!But gap patterns make senseGaps are good!
Sequence searching and alignments - Andrew Cowley
08/05/2012
182
Slide183Sequence searching and alignments - Andrew Cowley
08/05/2012
183
Slide184Sequence searching and alignments - Andrew Cowley
08/05/2012
184
Slide185Common problems with MSAInput format
Try using FASTA format
Unique sequence identifiers
Include sequence!
Usually limit of 500 sequences/1MBJob can’t be found/other errorResults deleted after 7 days
Some sequence/program combinations run out of memoryUse a different program
Sequence searching and alignments - Andrew Cowley
08/05/2012
185
Slide186Common mis-uses of MSAPerforming a sequence assembly
Specialist type of MSA
Use other tools (Staden etc.)
Aligning ESTs to a reference genome
Use EST2GenomeDesigning primersUse primer tools (primer3 etc.)
Aligning two sequencesUse a pairwise alignment tool!
Sequence searching and alignments - Andrew Cowley
08/05/2012
186
Slide187Putting it all togetherEBI SearchSequence retrieval
Sequence search
Sequences retrieval
Multiple sequence alignment
Analysis
Sequence searching and alignments - Andrew Cowley
08/05/2012
187
Slide188Final remarks
Don’t assume a single tool will cater for all your needs
Change the parameters of the tools
Remember where the tool excels and what its limitations are
A tool intended for specific task A can also be used for task B (and may be better than the tool intended for task B specifically!)
Crazy input will always give crazy results!
Sequence searching and alignments - Andrew
Cowley
08/05/2012
188
Slide189Getting Help
Slide190Getting Help
Database documentation
Frequently Asked Questions
http://www.ebi.ac.uk/help/faq.html
2can Support Portal
http://www.ebi.ac.uk/2can/
EBI Support
http://www.ebi.ac.uk/support/
Hands-on training programme
http://www.ebi.ac.uk/training/handson/
Sequence searching and alignments - Andrew Cowley
08/05/2012
190
Slide191Thanks!Hamish McWilliamVicky Schneider
Rodrigo Lopez
EMBL-EBI
SLING
Sequence searching and alignments - Andrew Cowley
08/05/2012
191
Slide192Survey!Sequence searching and alignments - Andrew Cowley
08/05/2012
192
https://www.surveymonkey.com/s/SequenceSearching