/
Design and Use of  RepeatMasker Design and Use of  RepeatMasker

Design and Use of RepeatMasker - PowerPoint Presentation

arya
arya . @arya
Follow
27 views
Uploaded On 2024-02-09

Design and Use of RepeatMasker - PPT Presentation

Jeremy Buhler jbuhlerwustledu Parts of RepeatMasker Programs Smit AFA Hubley R and Green P RepeatMaskerOpen 40 20132015 httpwwwrepeatmaskerorg RMBlast NCBI variant HMMER for comparisons ID: 1045613

repeatsrepeatmasker repeat repeatmasker repeats repeat repeatsrepeatmasker repeats repeatmasker sequence dfam nested detect nestingtime library blast compare overviewsources limitations repeatsissues

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Design and Use of RepeatMasker" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Design and Use of RepeatMaskerJeremy Buhlerjbuhler@wustl.edu

2. Parts of RepeatMaskerProgramsSmit AFA, Hubley R, and Green P. “RepeatMasker-Open 4.0.” 2013-2015. http://www.repeatmasker.org/RMBlast (NCBI variant), HMMER for comparisonsDataDfam https://www.dfam.org

3. OverviewSources of repetitive sequence dataHow RepeatMasker finds repeatsIssues and limitations

4. Data SourceUses a library of known repeat seqsSupplied by Dfam (“DNA families DB”)Repeat families in Dfam are carefully curated using multiple alignment tools.

5. Example of arepeat familysummary page from Dfam

6. Repeats are DNA MotifsRepeats occur in multiple instances, so use motif technology to represent themaccgataggtatacgtatca-tttacgatacatcgct-ggtttacgcgtcaattcaggatgcaccggt-tgtttacgtagcaatctaggatacaccgat-ggtttacgtatcaatttaggatac

7. Two kinds of model:consensus and HMM (= weight matrix + gaps)

8. Why Use Motifs for Repeats?Faster to compare one sequence/model to genome than many seqsEven simple motifs, like a consensus sequence, are better than individual instances for discovering new copies of a repeat.

9. Utility of Motif vs Instancesacacgtatagctactggttcaggc334

10. Utility of Motif vs Instancesacacgtatagctactggttcaggcacaggt2

11. Types of Repeats IdentifiedInterspersed (Alu, LINE, MIR, …)Micro- and mini-satellitesNoncoding RNAs (tRNA, rRNA, snoRNA, …)Short tandem + low complexity (agagagag, actactactact, aaaaataataaaa, …)Common artifacts (E. coli, vectors)

12. OverviewSources of repetitive sequence dataHow RepeatMasker finds repeatsIssues and limitations

13. The BasicsUses RMBlast (BLAST-like tool) to compare query to consensus model libraryUses HMMER (vaguely BLAST-like, but with much fancier math) to compare query to HMM library

14. Partial RepeatsRepeatMasker will cheerfully report an incomplete match to a repeat.Detects best-conserved partsSome repeats (retroposons) typically incomplete

15. Nested RepeatsRepeatMasker tries to detect nestingtime

16. Nested RepeatsRepeatMasker tries to detect nestingtime

17. Nested RepeatsRepeatMasker tries to detect nestingtime

18. Nested RepeatsRepeatMasker tries to detect nestingtime(Please don’t ask me how)

19. Nested RepeatsRepeatMasker tries to detect nestingtime(Please don’t ask me how)See the RepeatMasker presentation by Dr. Jessica Storer for details

20. OverviewSources of repetitive sequence dataHow RepeatMasker finds repeatsIssues and limitations

21. Library ChoiceMake sure to use correct libraries for your target species(Commonly used organisms have preselected library lists)Danger: mis-identifications!

22. Incomplete MaskingHighly diverged repeats can be tough to findMight leave ends of a repeat unmaskedIs this really a new feature?(masked)BLAST hit

23. Use the Right ToolTandem repeats and duplicationsDust (short) — Morgulis et al., 2006TRF (long) — Benson et al., 1999RNAtRNAscan-SE, Infernal, …Other repeatsSearch for matches to Dfam (HMMER) and the NCBI nt database (BLAST)Check the “Repeat tools” page on TE Hub

24. Hey, let’s be careful out there!In conclusion…Neal Wellons; https://flic.kr/p/FrUFPX