/
httpsbiointerfaceresearchcom httpsbiointerfaceresearchcom

httpsbiointerfaceresearchcom - PDF document

delilah
delilah . @delilah
Follow
342 views
Uploaded On 2022-10-11

httpsbiointerfaceresearchcom - PPT Presentation

2852 A rticle Volume 1 2 Issue 3 202 2 2852 2861 httpsdoiorg1033263BRIAC 12328522861 Ch aracterization of Flowering Genes of Arabidopsis thaliana for Mirror Repeats Usha Yadav 1 ID: 958556

mirror dna https sequences dna mirror sequences https repeats org doi sequence repeat gene genome 1000 present 1000bp genes

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "httpsbiointerfaceresearchcom" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

https://biointerfaceresearch.com/ 2852 A rticle Volume 1 2 , Issue 3 , 202 2 , 2852 - 2861 https://doi.org/10.33263/BRIAC 123.28522861 Ch aracterization of Flowering Genes of Arabidopsis thaliana for Mirror Repeats Usha Yadav 1 , Sandeep Yadav 1 , Dinesh C. Sharma 1,* 1 School of Life Sciences, Starex University, India * Correspondence: ddcsharma@gmail.com (D.C.S.) ; Scopus Author ID 57213217014 Received: 28.05.2021 ; Revised: 5.07.2021 ; Accepted: 9.07.2021 ; Published: 8.08.2021 Abstract: A variety of simple DNA repeats are enriched in the eukaryotic genomes. Recent studies have proven their importance in understanding genome organization and function , e specially how genomes evolve using them as mutational hotspots during DNA replication. Mir ror repeat sequences, the most underrated subset of this class of repeats, are now gaining importance because of their probable involvement in developing several genetic diseases in humans. These repeats typically adopt H - DNA conformations in both in - vitro and in - vivo conditions. On the other end, plants were still not analyzed for their presence or distribution and whether they are responsible for causing diseases in them or not. The present study aims to extract mirror repeats in the flowering genes of A rabidopsis thaliana . To this end , we have deployed FPCB (FASTA - PARALLEL COMPLEMENT - BLAST), an efficacious and quick method to extract perfect and degenerate mirror repeat sequences through pattern matching of alignments with user - defined algorithmic parame ters. All the analyzed genes were reported to have quite high densities of mirror sequences. A total of 93 unique mirror repeats of significant lengths were extracted in the analyzed genes. Keywords: DNA repeats; Mirror repeats; H - DNA; FPCB . © 202 1 by the authors. This article is an open - access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license ( https://creativecommons.org/licenses/by/4.0 / ). 1. Introduction DNA serves as a central entity to sustain life in the biosphere. It represents a regulatory circuit to ensure fidelity of genetic code for correct flow of genetic information and creation of variations through mutations, crossing - overs , and slipped strand mispairing. These events are responsible for gradually inculcating DNA repeats into the genomes [1, 2]. When a genomic segment of varying length reiterates itself several times throughout the genome, it is termed DNA repeat sequences. R epetitive DNA sequences are responsible for producing variations in the genome size in many plant species, as it is believed that the proportion of protein - coding sequences is generally similar amongst them [3]. Since a few years back, they were considered junk or selfish DNA , but now they have well - established roles ranging from embryonic development to infectious diseases [ 4 ] . Eukaryotic genomes harbor a great diversit y of repetitive DNA sequences [5 ]. Some studies claim that the human genome harbors as m uch as 6.7% of the total genome as simple sequence repeats, compared to previously mentioned 3

% [ 6 ]. Based on their frequency of occurrence, these sequences can be grouped into three categories: unique sequence DNA (present in one to few copies within the genome), moderately repetitive DNA (present in a few to about 10 5 copies) , and highly repetitive DNA (present in about 10 5 to 10 7 copies). Repetitive https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2853 sequences that frequently occur in the genome can be distributed at irregular intervals (dispersed or interspersed repeats) or clustered together (tandem repeats). Based on the length of the dispersed repeat, two different families are known: LINES (Long Interspersed Elements) , in which sequences are about 1,000 - 7,000bp long and SINES (Short Interspersed Elements) , in which sequences are about 100 - 4 00bp long [7 ]. Based on the symmetry and arrangement of nucleotide sequences, three major types of DNA repeats are inverted repeats, direct repeats, and mirror repeats. When genomic segments are exa ctly duplicated downstream on the same DNA strand, it is a direct repeat. When genomic segments are reversed and duplicated downstream but on the opposite strand, this represents an inverted repeat [8 ]. According to Mirkin et al . , mirror repeats are DNA s egments in which DNA bases that are equidistant from the symmetry center are identical to each other. Mirror repeat may be perfect or degenerate depending upon sequence type and number of spacer nucleotides [9 ]. All of the above - cited repeats can form var ious alternative DNA structures depending on sequence length, symmetry, base composition, supercoiling, temperature, free energy , and stabilizing agents. Almost 13% of the human genome can adopt these alternative or Non - B DNA conformations [ 10 ]. Moreover, these sequences are enriched at mutational hotspots in human cancerous genome s and stimulate mutagenesis [ 11 , 12 ] . Inverted repeats are capable of forming hairpin - like structures in single - stranded DNA and cruciforms in double - stranded DNA, only under the influence of negative supercoiling conditions. This is so because cruciforms are topologically similar to unwound DNA, which requires additional energy to release supe rcoils in double - stranded DNA [13 ]. Direct repeats are known to form a range of conforma tions like slippage structures, left - handed Z DNA when alternating purine - pyrimidine nucleotides are present, cruciform when elements of a perfect dyad symmetry and even length are repeated, H - DNA when direct repeats are homopurine - homopyrimidine rich sequ ences , and a particular G - quartet when tandemly arranged runs of guanines are p resent in single - stranded DNA [14 ]. Mirror repeats of varied compositions and types are abundant in natural DNAs [15 ]. Only a subset of such mirror repeats having homopurine - homopyrimidine rich sequences or H - palindromes is known to form intramolecular triplex structures commonly termed H - DNA [16 ]. H - DNA triplex is formed when a DNA strand from one half of the repeat f olds back , and its complementary remains single - stranded. H DNA is termed as H - y; if the strand involved in

triplex formation is pyrimidine rich and H - r if the strand is purine - rich. Like cruciform, H - DNA formation also requires duplex unwinding since it is topologically similar to unwounded DNA. Negative supercoiling in DNA and length of mirror rep eats affect H - DNA formation [17 ]. These polypurine and polypyrimidine - rich tracts with mirror repeats confer the unique regulatory capacity to the genome by put ting these sequences at high mutational risks. Some studies have proven that they are directly involved in the maintenance of chromosome structure, regulation of gene expression, DNA replication , and genome responses to environmental stimu li and physiologi cal changes [18 - 20 ]. While these roles contribute to their positive selection pressure during evolution, genomic instability caused by them indirectly results in the development of a number of diseases like Autosomal Dominant Polycystic Kidney Disease (ADP KD), Tuberous Sclerosis Complex (TSC), Lymphangioleiomyomatosis (LAM), Friedreich ' s ataxia, Follicular Lymphoma and Hereditary Persistence of Fetal Haemoglobin (HPFH) invo lving triple helix formation [21 - 23 ]. https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2854 Identif ying H - DNA - forming mirror repeats will help pinpoint specific sequences susceptible to high - frequency mutations, which helps understand the fundamental molecular basis of diseases like autosomal dominant polyc ystic kidney disease (ADPKD) [24, 25 ]. A very few studies have noted the occurrence o f these mirror - symmetrical sequences in the SARS - CoV - 2 strain , which is responsible for the current pandemic [ 26 ]. Mirror repeats are also found in plant genomes[3]. Triplex forming sequences and diseases/ anomalies directly caused by mirror repeats in pl ants are not yet worked out extensively. But their ubiquitous presence across all the domains of living kingdoms indicate s their key functional roles that need to be correctly outlined. Precise identification and distribution of such sequences is a princip al task that needs to be worked out on the very first front to speculate their origin and function. To serve the above purpose, the flowering genes of Arabidopsis thaliana were analyzed for mirror repeat sequences. Arabidopsis thaliana , a small weedy plan t belonging to the mustard family, is now a well - versed model system for basic genetic analysis in the plant kingdom. Its small and completely sequenced genome offers numerous opportunities for bioinformatics study [ 27 ] . Since flowering is one of the major events in a plant's life and essential for reproduction, genes belonging to this particular transitional stage were chosen for study. Their correct functioning and regulation were eternal for a plant. The study of these repetitive sequences in model plants might help understand key functions involved in genome evolution and phenotypic changes [ 28 , 29 ]. In genomic targeting fields, expansion of triplex - forming sequences is foreseen [ 30 ]. We have employed FPCB, a simple, accurate , and manual bioinformatics strategy to identify mirror repeat sequences. FPCB stands for F

ast - Parallel complement - Blast, a three - step strategy to find mirror repeat sequen ces through pattern matching [31 ]. This technique is based on the p rinciple that when a BLAST analysis is performed between any gene sequence and its parallel complement, some of the r esulting alignments will reveal mirror repeats. This is so because the parallel complement is exactly of the same nucleotide pattern but in an opposite orientation. For the first time, we have characterized the flowering genes of Arabidopsis thaliana for mirror repeats using this strategy. This initial but significant work may serve as a foundational role in developing new DNA fingerprinting like - technologies in plants and deciphering their function and evolution. These symmetries may act as therapeutic targets for drug/metabolite deliveries. In further studies, this strategy may help find the role of mirror repeats in plants and develop thei r possible applications. 2. Materials and Methods A total of 15 flowering genes of Arabidopsis thaliana were analyzed using the following approach of FPCB . It is a three - step process utilizing public domains like NCBI (for gene sequences), Reverse Complem ent Tool (for conversion to parallel complement) , and BLAST (for aligning sequences). Step 1: Extracting Gene Sequence The complete coding sequence of the gene to be analyzed was downloaded in FASTA format from the public domain NCBI ( https://www.ncbi.nlm .nih.gov/ ). The whole gene sequence was divided into regions of 1000bps each to identify shorter mirror sequences . A single such region represents your query sequence. https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2855 Step 2: Conversion To Parallel Complement Using Reverse Comple ment Tool, generate the complement of this coding sequence of 1000 bps [31] . The complement counterpart represents your subject sequence. Step 3: Extracting Mirror Repeat Sequences Using Blast Both of the queries mentioned above and the subject sequence were aligned for homology using Align Sequences Nucleotide BLAST [31] . The program selection was optimized for somewhat similar sequences. The algorithmic parameters for the expected threshold and word size can be adjusted accordingly. Ali gnments produced can then be analyzed for pattern matching to trace mirror sequences present in the query sequence/gene sequence. I t will be a mirror sequence if the position number is exactly reversed in the subject and query sequence. The total number of mirror sequences present in the query sequence/gene sequence can also be conf irmed through Dot Matrix Plot. 3. Results and Discussion The final BLAST result window yields a number of alignments with sequence homologies. Alignments in which the position number of nucleotide s are exactly reversed between the subject and query sequence are always found to mirror repeat sequences. The remaining alignments do not harbor any mirror symmetries. A dot - matrix plot of subject sequence versus query sequence is also produced by the BLAST analysis as in Fig.1 (B). The r esult in this example is clearly showing the presence of a total of ten alignm

ents as dots. The number of alignments present over diagonal always corresponds to the number of alignments that mirror repeat sequences (8 alignments are present on the straight line). Else, others are not mirror ed sequences (2 sequences which are not present over the straight line). MIRROR SEQUENCE NOT A MIRROR SEQUENCE A https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2856 Fig ure 1. Highlighting c riteria for analyzing alig n ments for mirror repeats ( A ) Screenshot of alignments produced after running nBLAST clearly outlining the criteria for determining mirror repeats ; ( B ) A Web page window showing a dot matrix pot of resulting alignments. Table 1. Representing the location and occurrence of mirror repeat sequences in the analyzed genes. S.N. Symbol/Gene ID Region No. of mirror sequences in size range TOTAL (7 - 12bp) (13 - 18bp) (19 - 24bp) (≥25bp) 1. AG/ 827631 1 - 1000bp 31 2 0 1 34 1000 - 2000bp 32 7 4 4 47 2000 - 3000bp 22 2 1 1 26 3000 - 4000bp 25 4 4 2 35 4000 - 5000bp 16 2 4 0 22 5000 - 5684bp 18 3 2 1 24 2. AP1/ 843244 1 - 1000bp 30 3 3 2 38 1000 - 2000bp 23 4 1 2 30 2000 - 3000bp 10 2 3 4 19 3000 - 4000bp 24 7 2 1 34 3. BLR - RPL/ 831745 1 - 1000bp 20 5 3 1 29 1000 - 2000bp 11 0 0 0 11 2000 - 3000bp 20 4 1 3 28 3000 - 3559bp 13 1 0 3 17 4. 4 CO/ 831441 1 - 1000bp 16 3 2 0 21 1000 - 2000bp 18 2 1 0 21 2000 - 2924bp 26 1 0 1 28 5. ETT/ 817956 1 - 1000bp 18 4 0 4 26 1000 - 2000bp 24 3 0 0 27 2000 - 3000bp 17 1 1 0 19 3000 - 3773bp 14 0 0 1 15 6. 6 FLC/ 830878 1 - 1000bp 20 1 2 3 26 1000 - 2000bp 28 2 0 2 32 2000 - 3000bp 24 2 3 0 29 3000 - 4000bp 21 3 0 1 25 4000 - 5000bp 23 3 3 0 29 5000 - 6000bp 23 1 3 2 29 6000 - 6067bp 4 0 0 0 4 7. FT/ 842859 1 - 1000bp 28 5 1 0 34 1000 - 2000bp 19 5 4 3 31 2000 - 2627bp 16 4 2 1 23 8. 8 JAG/ 843177 1 - 1000bp 24 2 1 2 29 1000 - 1882bp 16 4 0 1 21 9. 9 LFY/ 836307 1 - 1000bp 20 5 3 3 31 1000 - 2000bp 36 6 2 1 45 2000 - 2742bp 14 3 0 3 20 Red dotted circles are mirror sequences Blue dotted circles are not mirror sequences B https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2857 S.N. Symbol/Gene ID Region No. of mirror sequences in size range TOTAL (7 - 12bp) (13 - 18bp) (19 - 24bp) (≥25bp) 10. 1 PHYB/ 816394 1 - 1000bp 21 4 1 3 29 1000 - 2000bp 24 2 1 0 27 2000 - 3000bp 24 3 1 0 28 3000 - 4000bp 14 2 1 2 19 4000 - 4702bp 16 3 1 1 20 11. PI/ 832146 1 - 1000bp 21 4 2 2 29 1000 - 2000bp 15 5 2 2 24 2000 - 2671bp 20 0 0 1 21 12. 1 SEP3/ 839040 1 - 1000bp 11 1 3 2 17 1000 - 2000bp 31 3

1 0 35 2000 - 2588bp 13 5 1 2 21 13. 1 SNZ/ 818510 1 - 1000bp 21 3 3 2 29 1000 - 2000bp 23 4 2 2 35 2000 - 2486bp 13 2 1 0 16 14. 1 SOC1/ 819174 1 - 1000bp 20 5 1 5 31 1000 - 2000bp 25 3 1 1 30 2000 - 3000bp 30 2 1 1 34 3000 - 3621bp 10 4 2 0 16 15. 1 SPL/ 828841 1 - 1000bp 25 7 4 1 37 1000 - 1517bp 4 1 1 1 7 Using this simple strategy, we have the first time reported the occurrence of a variety of mirror sequences in the flowering genes of Arabidopsis thaliana . Irrespective of their sizes, all of the studied genes have quite high mirror symmetries , the h ighest being found in the size region ranging from 1 - 1000bps, which exclusively corresponds to the promoter domains of the genes. Their enrichment in the promoter domains of the gene supports their speculated function in replication and transcriptional regulatio n. Mirror sequences with a shorter stretch of nucleotides were abundantly present (mirror sequences with size s ranging from 7 - 12bps) in every gene. As the length of the mirror sequence increases, their abundance into the genomes decreases. Apart from the v ariations in the sequence densities or occurrence, mirror sequences are universally found in every gene that has been studied. The present work confirms the existence of simple randomly dispersed DNA sequences with mirror symmetries in the coding segments of the Arabidopsis thaliana genome. This was contrary to the belief that Arabidopsis thaliana ' s genome relatively lacks simple DNA repeats. Table 2. Showing the identified mirror repeats of significant lengths (≥25bps) in the analyzed genes. S.N. GENE MIRROR REPEATS SIZE (bp s ) 1 AG ATACTTATCTCATAGATTCCATTCATA 27 CACACACATATATATATAAACACAC 25 TTTC - TTCTTCTTCTCGTGCTC - TGTTCTTACTTT 33 ATTCCACACACATATATATATAAACACACTAACATTA 37 TTATTTTTCACTTTTTTC - CTTCATATT 27 AAAATTACTCTTTTTAAAATTAAAA 25 TATGCAATTTCTCTTTCTTTTTGAAGTAT 29 AAAAGAATAAATGGTAAATTTAATTATATTCCAAATAAGGAAA 43 GTTGTATCAGTGAATTTTA -- TGCTTATGTTG 30 2 AP1 AAAATGTTTAATACA - AATTTGTATAA 26 TGTTCTCTGTGATGCTGAAGTTGCTCTTGT 30 TTGAGAATTT -- TTTATTAGAAAGAATATTTAACTTACGAGTT 41 TTGGTTTAATTGCA --- TAAAACCATCATTAG ATTTATCCTAAAATGTGATGATATTTTGGTT 60 TTACAAGTGTTATTATAATGTGAACATT 28 GTATGTAAA -- ACCCCTATCAAATGTATG 27 CATATCTATGTATATGAATATAGAC 25 https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2858 S.N. GENE MIRROR REPEATS SIZE (bp s ) ATACATATGTGTATGTATCAATATATATA 29 GATGTCATGATTTTGAAACTAGAAAACTTTATTTTAAAACATTA --- TTTTATTAACGTAG 58 3 BLR - RPL ATCTGTAAAT -- TTTCTTTAATTATTGTCTA 29 CTGGAGACTCGACAATCTCAGAGATC 26 CAGAGA --- TCTTCTTCTTCCTCTTGGAGAGAC 30 TGTAT CGAAGTGTATTTTTACTTGGAAGTATGT 33 GAGGTTTGGGCTTGATGGTGGTAGTGGCGATGGTGGTGGT — GGGTATGAAG 52 AAATTGGATTCAATAACCAATGTTAAA 27 AAAAATATGTTTGACGTTTGG - GTATGTATAAGAA 34 4 CO AAGAAGTAAAGACAGA - ACAAATGAAAAA 28 5 ETT CGTCTTCAGCTTCTGGGTCTGTCTCTCCTACTTCGTCTTCTTCAGCTTC - - TGTGTCTGT -

GGTGTCTTCGAATTCTGC 76 GTCTTCTTCAGCTTCTGTGTCTGTG - GTGTCTTCGAATTCTGCTG 44 CTCACTCATCACTGTGCTACTACTCTCTC 29 GGTGGAGGGGTTTGTTTGGAGCTGTGG 27 CTTGTAA -- ACTTGGTTCAAGAATGTTC 26 6 FLC TTGTTCATTTCTCTCTCTATTTCTT 25 TTGGTGATTATCCAAATTAGTGTTT 25 AAAA CTAGAAATCAAGCGAA - TTGAGAACAAAA 32 AAATGGTTGTAGTAGTTTGGCCATGTTGGTCAA 33 TTGATGCATACTTTGTTAGGATTTGTTCACCCCTAGTT 38 TCTTTCAGT - TAATTTCAGAAAATTAAGAGA ---- AATATGACTTTCT 43 GTTTAATTAGGTTTTGG - TTCATTTG 25 TTTTGGTTCTTCTTCTTCG TTTTTTT 26 7 FT TTTCTTGTGTTATCTCATTTTCCAAACTTCAAAAAAGAAAAAGAAAAAA --- - AGACCTTTTGCTTTCTTGATTTCTTT 75 TATATCACTTTTTATTTTTATTTATAT 27 CTTCTTCTTTCTTGTGTTATCTCATTTTC 29 ATAGTATTTTAATTTAATAACC -- ATTTTATGATA 33 8 JAG GAAAGAAGAAAGGGAGTAAAGAAGGGAAAGATGAGAG 37 TGGAGGTATA - TAGTTATTAGTGTGTG -- TATTGTTAATGTGAAGGT 44 TTTTCTTTTCTTTTCTTTTTTTGTTTTTTTTT G GGTATTTTCTTGTCCTCTT GTTTT 57 9 LFY TATTTACTTGTATGATATTGATTTA - GAGCTACTGTGTGTATAGAGTATGCAGTCATAGTCTGAT - ATTTAT 70 TTAATTTATAT -- GTTGGATATATTTACTT 28 ATTGTGACAATTTATATGTAGATG --- ATTGGACAGGGTTA 38 GTTCTAATTTATTAAATT - TTTCATTTTG 28 TGTTATTGGTTCCAA TATTTTGATTATTGT 30 GTTTGGTTTGGGTAGTTGTGGTTTG 25 GCGGTTGCTG - CGGCTGCGGCTTTAGTTGGCG 31 10 PHYB AGTGGCGGTGGCCGTGGCGGTGGCCGTGGCGGAGA 35 CGGAGTCGGGGGTAGTGGCGGTGGCCGTGGCGGTGGCCGTGGC 43 GAGCTCGATTCTACTCGAGCGTGCTT - TCGTTGCTCGAG 38 TTGGTTTGGTTAATTACGAATTTGATTTAGGCGTTTAAAGAATTTGAGGT TTTAACCAATTCACTATTTGTTTTGGTT 78 TGTTTTGG - TTAT - TGTTTAGTTGGAACCTAGATT — AGTTTGATTTTTGTATTCGGTTTAGT 59 TTCAAATTGATGAAAACCAGCTCAAAAGT - GTAAAACTT 38 11 PI CAATTTCT --- TTACCAAAAACATGTCAAAAGACCCTTGAATCTTTAAC 46 TCTCTCTTCACCTCAAGATTAATCAAAAC --- ATTTCTCTCT 39 TATGTATACATTTCTACATACATTCATACACATGAAT 37 TAGTATAGTATATATATAAAACATTTGAT 29 GTGTGTTGT - TTGCTTATG - ACCTCTATGTATTGGTTGTGTTGTGTG 45 12 SEP3 TTAATTTCTCTTGTGAGTACTCTTTAATT 29 TGTCTTGTA - TGTATGGGTCTCTCTGTG - ATGTGTTGTTGTGT 41 TAACTTTAGACTAGTATAACCAATTTGATTTGAGTTCTAT 40 CTATTTGAATCTTTCTCACTTAATC 25 13 SNZ GAAAAGAGTAAACCAAATGATGAAAG 26 https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2859 S.N. GENE MIRROR REPEATS SIZE (bp s ) AGATGTCTGAA - AAGTAGCCAAAAACTAATAAATCAGTGTGTAGA 44 TATAAATGTAGGGTCACGGCTGTAAATAT 29 TATATCTTATTCAACTTGATCAATAT 26 14 SOC1 TCTTTCTCTTATTTTATTATCTTTCT 26 ACGAGAAGAGGATCTTTTTTAAGGAGAAAAGCA 33 GAGAAAAGCAGAGAGAGAAGAGACGAG 27 GTTTCATTTGGTTCGATTTGATGTGTTTGGTTT - CTTTG 38 CTCCTATATCTCTACCT - ATACATACACAAACCCTTTATCCTC 42 TTTAATCATCTGTCTCTCTCTTTCTCAATTAGTTT 35 ATATATCAATTCTTGCTAATTAATACTTTT -- ACTATATA 38 15 SPL AAAAAGTTTTGATTTTTATTTATGTAAAA 29 GGGTTTAGGGACCAATGGAGGAATTTGGG 29 Table 3. Depicting distribution of identified mirror repeats in the selected genera ' s . S. N. Mirror sequences Escheric hia coli Chlamydo monasrein hardtii Saccharo myces cerevisiae Oryza sativa Caenorh abditisele gans Drosophi la melanoge star Musmus culus Homo sapiens 1. CACACACATA TATATATAAA CACAC - -

- + - + + + 2. AAATTGGATT CAATAACCAA TGTTAAA - - + + + - + + 3. GTTGTATCAG TGAATTTTA -- TGCTTATGTT G + - + + + + + + 4. AGTGGCGGTG GCCGTGGCGG TGGCCGTGGC GGAGA - - + - - - - - 5. TAACTTTAGA CTAGTATAAC CAATTTGATT TGAGTTCTAT - - - - - - - - Using megaBLAST analysis, the distribution of the above - identified mirror repeat sequences was studied in different genera ' s (*algorithmic parameters: max target sequences=5000, expected threshold=500 & word size=16). The repeats were selected on a random basis , with size s ranging between 25 - 45bps. The above results clearly confirm the universal existence of mirror repeat sequences of size less than 30bps in all domains of life , ranging fr om plants, worms, flies , and humans. Surprisingly, Chlamydomonasreinhardtii , green algae evolutionary thought as most related to the plants , do not harbor any of these mirror symmetries. On the contrary, m irror sequences identified in coding segments of Ar abidopsis thaliana were found to be present in almost every gen us studied for their distribution. These sequences were conserved during the course of evolution. Hence, they must be performing something important in the genomes of living beings. 4. Conclus ions The present studies diagnosed mirror repeats in the 15 flowering genes of A. thaliana . A total of 93 unique mirror repeats of significant length were extracted using FPCB. Among them, repeats of shorter size less than or equal to 30 basepairs were present in almost every gen us studied through megaBLAST analysis. Hence, it can be concluded that these sequences distributed in all the genera may be more conservativ e or might have an important role in the evolution. Further studies might reveal if these mirror repeats are capable of causing disease in https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2860 plants through triplex formation, as is the case in humans or not. The mirror repeats in the genome may drive new aspects towards genom ics and evolutionary studies. Funding This research received no exte rnal funding. Acknowledgments We exten d our heartiest gratitu d e to Dr. Vikash Bhardwaj for encouraging us to utilize FPCB. Conflicts of Interest The authors de clare no conflict of intere st. References 1. Gurusaran, M.; Ravella, D.; Sekar, K. RepEx: Repeat Extractor for Biological Sequences. Genomics 2013 , 102 , 403 – 408 , https://doi.org/10.1016/j.ygeno.2013.07.005 . 2. Jurka, J.; Kapitonov, V. V.; Kohany, O.; Jurka, M. V. Repetitive Sequences in Complex Genomes: Structure and Evolution. Annu. Rev. Genomics Hum. Genet. 2007 , 8 , 241 – 259 , https://doi.org/10.1146/annurev.genom.8.080706.092416 . 3. Mehrotra, S.; Goyal, V. Repetitive Sequences in Plant Nuclear DNA: Types, Distribution, Evolution and Function. Genomics, proteomics & bioinformatics 2014 , 12 , 164 – 171 , https://doi.org/10.1016/j.gpb.2014.07.003 . 4. Arancio, W.; Coronnello, C. Repetitive Sequ ences in Aging. Aging (Albany NY) 2021 , 13 , 10816 – 10817 , https://doi.org/10.18632/aging.203020 . 5. Cox, R

.; Mirkin, S. M. Characteristic Enrichment of DNA Repeats in Different Genomes. Proceedings of the National Academy of Sciences 1997 , 94 , 5237 – 5242 , https://doi.org/10.1073/pnas.94.10.5237 . 6. Shortt, J. A.; Ruggiero, R. P.; Cox, C.; Wacholder, A. C.; Pollock, D. D. Finding and Extending Ancient Simple Sequence Repeat - Derived Regions in the Human Geno me. Mobile DNA 2020 , 11 , 11 , https://doi.org/10.1186/s13100 - 020 - 00206 - y . 7. Pathak, D.; Ali, S. Repetitive DNA: A Tool to Explore Animal Genomes/Transcriptomes. Functional genomics. InTech, Published 2012 , 155 – 180 , https://doi.org/10.5772/48259 . 8. Ussery, D.W. ; Wassenaar, T.M. ; Borini, S. Computing for comparative microbial genomics: bioinformatics for microbiologists Springer Science & Business Media , I, 2009 , Vol. 8 , 1 - 244. 9. Mirkin, S.M. DNA Topology: Fundamentals. In eLS 2001 , https://doi.org/10.1038/npg.els.0001038 . 10. Guiblet, W. M.; Cremona, M. A.; Harris, R. S.; Chen, D.; Eckert, K. A.; Chiaromonte, F.; Huang, Y. - F.; Makova, K. D. Non - B DNA: A Major Contributor to Small - and Large - Scale Variation in Nucleotide Substitution Frequencies across the Genome. Nucleic Acids Research 2021 , 49 , 1497 – 1516 , https://doi.org/10.1093/nar/gkaa1269 . 11. McKinney, J. A.; Wang, G.; Mukherjee, A.; Christensen, L.; Subramanian, S. H. S.; Zhao, J.; Vasquez, K. M. Distinct DNA Repair Pathways Cause Genomic Instability at Alternative DNA Structures. Nat Commun 2020 , 11 , 236 , https://doi.org/10.1038/s41467 - 019 - 13878 - 9 . 12. Pant, P.; Fisher, M. DNA Triplex with Conformationally Locked Sugar Disintegrates to Duplex: Insights from Molecular Simulations. Biochemical and Biophysical Research Communications 2020 , 53 2 , 662 – 667 , https://doi.org/10.1016/j.bbrc.2020.08.097 . 13. Krasilnikova, M. M.; Samadashwily, G. M.; Mirkin, S. M. Replication of Simple DNA Repeats. Gene Therapy and Molecular Biology 1999 , 3 , 397 – 412 , https://cit eseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.674.4125&rep=rep1&type=pdf . 14. Sundquist, W. I.; Klug, A. Telomeric DNA Dimerizes by Formation of Guanine Tetrads between Hairpin Loops. N ature 1989 , 342 , 825 – 829 , https://doi.org/10.1038/342825a0 . 15. Schroth, G. P.; Ho, P. S. Occurrence of Potential Cruciform and H - DNA Forming Sequences in Genomic DNA. Nucleic acids research 1995 , 23 , 1977 – 1983 , https://doi.org/10.1093/nar/23.11.1977 . https://doi.org/10.33263/BRIAC00.000000 https://biointerfaceresearch.com/ 2861 16. Mirkin, S. M.; Lyamichev, V. I.; Drushlyak, K. N.; Dobrynin, V. N.; Filippov, S. A.; Frank - Kamenetskii, M. D. DNA H Form Requires a Homopurine – Homopyrimidine Mirror Repeat. Nature 1987 , 330 , 495 – 497 , https://doi.org/10.1038/330495a0 . 17. Mirkin, S. M.; Frank - Kamenetskii, M. D. H - DNA and Related Structures. Annual review of biophysics and biomolecular structure 1994 , 23 , 541 – 576 , https://doi.org/10.1146/annurev.bb.23.060194.002545 . 18. Charlesworth, B.; Sniegowski, P.; Stephan, W. The Evolutionary Dynamics of Repetitive DNA in Eukaryotes. Nature 1994 , 371 , 215 –

220 , https://doi.org/10.1038/371215a0 . 19. Pabis, K. Triplex and Other DNA Motifs S how Motif - Specific Associations with Mitochondrial DNA Deletions and Species Lifespan. Mechanisms of Ageing and Development 2021 , 194 , 111429 , https://doi.org/10.1016/j.mad.2021.111429 . 20. Brazda, V.; Fojta, M.; Bowater, R. P. Structures and Stability of Simple DNA Repeats from Bacteria. Biochem J 2020 , 477 , 325 – 339 , https://doi.org/10.1042/BCJ20190703 . 21. Bissler, J. J. Triplex DNA and Human Disease. Front Biosci 2007 , 12 , 4536 – 4546 , https://doi.org/ 10.2741/2408 . 22. Khristich, A. N.; Armenia, J. F.; Matera, R. M.; Kolchinski, A. A.; Mirkin, S. M. Large - Scale Contractions of Friedreich ' s Ataxia GAA Repeats in Yeast Occur during DNA Replication Due to Their Triplex - Forming Ability. Proceedings of the National Academy of Sciences 2020 , 117 , 1628 – 1637 , https://doi.org/10.1073/pnas.1913416117 . 23. L, P.; Gf, R. Alternative DNA Structures In Vivo: Molecular Evidence and Remaining Questions. Microbiol Mol Biol Rev 2020 , 85 , https://doi.org/10.1128/mmbr.00110 - 20 . 24. Blaszak, R. T.; Potaman, V.; Sinden, R. R.; Bissler, J. J. DNA Structural Transitions with in the PKD1 Gene. Nucleic acids research 1999 , 27 , 2610 – 2617 , https://doi.org/10.1093/nar/27.13.2610 . 25. Zhang, J.; Fakharzadeh, A.; Pan, F.; Roland, C.; Sagui, C. Atypical Structures of GAA/TTC Trinucleotide Repeats Underlying Friedreich ' s Ataxia: DNA Triplexes and RNA/DNA Hybrids. Nucleic Acids Research 2020 , 48 , 9899 – 9917 , https://doi.org/10.1093/nar/gkaa665 . 26. Reza Dawoudi, M. Mathematical Modeling Approaches to Understanding Severe Acute Respiratory Syn - Drome Coronavirus 2 (SARSCoV - 2) DNA Sequences Linked Coronavirus Disease (COVID - 19) for Discovery of Potential New Drugs. OAJBS 2020 , 2 , https://doi.org/10.38125/OAJBS.000173 . 27. Lan, Y.; Sun, R.; Ouyang, J.; Ding, W.; Kim, M. - J.; Wu, J.; Li, Y.; Shi, T. AtMAD: Arabidopsis Thaliana Multi - Omics Association Database. Nucleic Acids Research 2021 , 49 , D1445 – D1451 , https://doi.org/10.1 093/nar/gkaa1042 . 28. Negm, S.; Greenberg, A.; Larracuente, A. M.; Sproul, J. S. RepeatProfiler: A Pipeline for Visualization and Comparative Analysis of Repetitive DNA Profiles. Mol Ecol Resour 2021 , 21 , 969 – 981 , https://doi.org/10.1111/1755 - 0998.13305 . 29. Sproul, J. S.; Barton, L. M.; Maddison, D. R. Repetitive DNA Profiles Reveal Evidence of Rapid Genome Evolution and Reflect Species Boundaries in Ground Beetles. Systematic Biology 2020 , 69 , 1137 – 1148 , https://doi.org/10.1093/sysbio/syaa030 . 30. Taniguchi, Y.; Magata, Y.; Osuki, T.; Notomi, R.; Wang, L.; Okamura, H.; Sasaki, S. Development of Novel C - Nucleoside Analogues for the Formation of Antiparallel - Type Triplex DNA with Duplex DNA That Inclu des TA and DUA Base Pairs. Org. Biomol. Chem. 2020 , 18 , 2845 – 2851 , https://doi.org/10.1039/D0OB00420K . 31. Vikash, B.; Swapni, G.; Sitaram, M.; Kulbhushan, S. FPCB: A Simple and Swift Strategy for Mirror Repeat Identification. arXiv preprint arXiv:1312.3869 2013 , https://arxiv.org/abs/1312.3869v

Related Contents


Next Show more