/
Hands on selected bioinformatics Hands on selected bioinformatics

Hands on selected bioinformatics - PowerPoint Presentation

hanah
hanah . @hanah
Follow
27 views
Uploaded On 2024-02-09

Hands on selected bioinformatics - PPT Presentation

software Marjana Westergren Tine Grebenc Molecular markers Molecular markers fragment of DNA that is associated with a certain location within the genom e or other characteristic ID: 1045619

phylogenetic genepop tree sequences genepop phylogenetic sequences tree format step genetic http data dna multiple select amp method sequence

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Hands on selected bioinformatics" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Hands on selected bioinformatics softwareMarjana Westergren, Tine Grebenc

2. Molecular markersMolecular markers = fragment of DNA that is associated with a certain location within the genome or other characteristic of an organismHeritable DNA sequence differences (polymorphisms)Phenotypically neutral, developmentally and environmentally stableDetectableLevel of resolution?

3. DNA extractionFungal cell components:Nucleic acidsLipidsProteinsSugars and other water soluble componentsLysis of cells and cell walls (buffer, detergent, 65ºC)Precipitation of proteins and removal of lipids (chloroform+ethanol / phenol)Physical separation of cell walls and wather phase solutionPrecipitation in (2-propanol)Washing of DNA and resuspendingStorage (4 ºC or -80 ºC)Sample in bufferOrganic solventSedimentationRemoval of waer phaseSedimentationResuspending of DNAKits:Manual procedure:

4. PCR – Polymerase Chain ReactionPCR – the principleNuclear ribosomal ITS region – commonly used in identification and phylogenyhttp://www.mun.ca/biology/scarr/PCR_sketch_3.gif

5. DNA sequencing - „reading“ of the nucleotide sequenceSequencing principlehttp://www.newscientist.com/...Analysis of raw sequence (above):Commercial programs: Sequencher (demonstration)Sequencing Analysis software (Applied Biosystems)Free available: FinchTVOutput: sequence in a „FASTA“ format ( = text format)

6. „FASTA“ format ( = text format):>Unknown sample CATTACCAATATCTGGGATGCCAAAGACACAGGCTCCCGATAAAACACATTTATGCGTATCCTCCCATGTTGCTTTCCCAGGCCAGCGGCCACTGCTGCCAGCCATGCCGTTTTTCGGTTACATGGTTGAGGTGCTTGGGGAAGGGCTAATTATCAAACTTTACTTCACCTTATTGTCTGAGAAGGCCATGTGCCGTAATCTTTAAACATGTTAAAACTTTCAACAACGGATCTCTTGGCTCTCGCATCGATGAAGAACGCAGCGAAATGCGATAAGTAATGTGAATTGCAGAATTCAGTGAATCATCGAATCTTTGAACGCACATTGCGCCCTTTGGTATTCCTTAGGGCATGCCTGTTCGAGCGTCGCAAAAACCCAGATCACCTAGAGTGTGGTATTGGCAGAAGTGGCCGGGGCTATCAGCGCTGCTGCCACTCTGCTGGAATGAATAGGCTGGAAAAGTAGATCATAGCAACAGACTTTCACAGTATTTTGAAATGCTAAATTAGTTTGAAGCTGATCGGAACCTAAGCCATTTGACCCCCATCCTGCGTAAAGCAGTAAGGTTGACCTCGGATCAGGTAGGGATACCCGCTGAACTTAAGCATATA base-to-base comparison of the nucleotide sequence with available databases:http://www.ncbi.nlm.nih.gov/ (general for all organisms and markers)https://unite.ut.ee/ (only for ITS region in fungi)

7. Step 1 - Alignment of sequencesGrouping together sequences with least evolutionary differencesPairwise and multiple alignment; most programs use both approachesSpecialised tools available for multiple alignment:http://www.ebi.ac.uk/Tools/msa/Online :+ faster; no need for a good computer (processor power) + usually more user-friendy (in terms of input/output - limiter number of characters/sequences Local : + more flexible in terms of amount of data - less user friendly, some with poor error messages - computing demanded

8. Step 2 - Phylogenetic analysisWhich phylogenetic method to use:Detect similarity based on multiple alignment a. strong similarity -> Maximum Parsimony b. weak (distant) similarity -> Distance methods c. very weak similarity -> Maximum LikelihoodRegardless the method used, always check the validity of results (statistics or comparison of topologies among different approaches.

9. Step 2 - Phylogenetic analysisa. Maximum ParsionyAn non-parametric approach also known as minimum evolutionGood for similar sequencesIt builds a single phylogenetic tree which explains the evolution with fewest changes required to the present state and groups sequences with similar amount of variationsOnly involves parsimony informative characters (e.g. where at least one change exist on one position)Less suitable for larger datasetsIt does not give branch length but only topologyNot statistically consistent

10. Step 2 - Phylogenetic analysisb. Distance methodsA non-parametric methodBased on a distance matrix of all compared sequences followed by construction a guide tree (clustering distances) and subsequent iteration build up of branches and nodesSimple algorithms exist to construct a tree directly from pairwise distances (UPGMA - Unweight pair group method with arithmetic mean and NJ - Neighbor joining)

11. Step 2 - Phylogenetic analysisc. Maximum likelyhood (ML)A parametric method A parsimony method that employs an explicit model of character evolutionThe dominant model in molecular evolution analysesIt requires a reliable MODEL (model = list of probabilities for various evolution changes) – MEGA or jModelTestImplemented in most phylogenetic programs such as : MEGA, phyML, MrBayes,…

12. Step 3 - Statistical approachesAim : to evaluate the significance of the obtained phylogenetic relationships (trees)Approach:Bootstraping: build up of initial tree; iterative reampling of the original sequences with subsequent evaluation of the initial & new tree topology; each node/branch is evaluated for identity with initial tree and a score is given. Bootstrap value is the sum of scores. Available in most ML programsApproximate likelihood-ratio test (aLRT): acts as an alternative to nonparametric bootstrap and Bayesian estimation of branch support; based on assumption that the inferred branch has length 0; fast but not directly comparable with bootstrap values. Available in phyML program.

13. Step 4 – Presenting phylogeny dataphylogenetic treesPhylogenetic tree (also named dendrogram) is a presentation of the evolutionary relationships among organisms.Phylogenetic tree is composed of:Nodes (representing relationship among taxa/sequences as a special event in past which remained fixed in the evolution)Branches (their length represents number of changes in sequences,Leaves (represent the recent taxa)May be rooted or unrooted. Rooted: root is the common ancestor to all sequences and internal nodes and the distance from root to leaf correspond to evolutionary time

14. Analysing microsatellite data – a practical example

15. Microsatellites are best for…→ microsatellite loci provide excellent resolution of recent and ongoing microevolutionary processes (Wang 2010)average mutation rate of microsatellites: l = 5 × 10-4 (Goldstein & Schlotterer 1999; Whittaker et al. 2003) DNA fragments of different sizes detected by initial amplification using polymerase chain reaction (PCR) and visualization via electrophoresis -> size polymorphism reflects variation in the number of repeats of a simple DNA sequence

16. How many trees to sample?obtaining accurate allele frequencies and accurate estimates of diversity are much more important than detecting all of the alleles, given that very rare alleles (i.e. new mutations) are not very informative for assessing genetic diversity within a population or genetic structure among populations (Hale et al. 2012).25 to 30 individuals per population suffices for population genetic studies based on microsatellite allele frequencies (Hale et al. 2012).

17. Analysis of microsatellite dataDeviations from Hardy-Weinberg equilibriumNull allelesLinkage disequilibrium (measurement of proximal genomic space)Allelic indices (Na, Ne, Ho, He, Ar)F statistics (Fis, Fst, Fit)Genetic structure & genetic distancesSpatial genetic structure…

18. Practical example3 European beech populations from Slovenia (partial, but real data)5 lociData in GenAlEx format

19. Software neededGenAlEx (Peakall & Smouse 2012): http://biology-assets.anu.edu.au/GenAlEx/Welcome.htmlGenepop (Raymond & Rousset 1995) or Genepop on the web: http://genepop.curtin.edu.au/

20. Data# loci# all trees# populations# trees per population

21. Export to GenepopLeave default optionsSave as txt file

22. Deviations from HWENull hypothesis = random union of gametesOpen Genepop on the web: http://genepop.curtin.edu.au/Select option 1 (Hardy Weinberg Exact Tests)Copy paste data into the form on the bottom, select HTML - plain text as results format, otherwise leave default optionsPress submit

23. Deviations from HWE - resultsResults given per locus & per populationNote: adjust probability values according to Bonferroni procedure for multiple comparisons when comparing multiple populations of loci (Rice 1989)

24. Adjusting for multiple comparisonsFrom genepop output, pop 2Sort by P valueDivide alpha (0.05) by number of samples to get adjusted alphaIf P value < adjusted alpha, then null hypothesis can be rejectedNo deviations from HWE in our dataset Bonferroni correction procedure:

25. Linkage disequilibriumNull hypothesis = Genotypes at one locus are independent from genotypes at the other locusOpen Genepop on the web: http://genepop.curtin.edu.au/Select option 2 (Linkage Disequilibrium)Copy paste data into the form on the bottom, select HTML - plain text as results format, otherwise leave default optionsPress submit

26. Linkage disequilibrium - ResultsNo deviations from linkage equilibrium in our dataset

27. Null allelesMaximum likelihood estimation of null allele frequencyOpen Genepop on the web: http://genepop.curtin.edu.au/Select option 8 (Miscellaneous Utilities)Copy paste data into the form on the bottom, select HTML - plain text as results format, otherwise leave default optionsPress submitNote: consider also INEST, MicroChecker, FreeNA for checkig null alleles and their significance

28. Null alleles - resultsNull alleles are present on locus L5 with high frequency (significant - CI): omit locus L5 from further analysis or adjust allele frequencies

29. Back to genalex – allelic indicesGo through resultsPopulations have genetic diversity estimates of similar valuesAdd-Ins → GenAlEx → Frequency