Jan Pačes Institute of Molecular Genetics AS CR sizes of selected completed genomes genome chromosomes size genes Mycoplasma genitalium 058 Mbp 521 Escherichia coli 46 Mbp 54 ID: 775219
Download Presentation The PPT/PDF document " Genome structure and evolution" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Genome structure and evolution
Jan Pačes
Institute of Molecular Genetics AS CR
Slide2sizes of selected completed genomes
genomechromosomessizegenesMycoplasma genitalium0.58 Mbp521Escherichia coli4.6 Mbp(5.4 Mbp)4 377(5 416)Saccharomyces cerevisiae1612.5 Mbp5 770Caenorhabtitis elegans6~100 Mbp19 427Arabidopsis thaliana5~115 Mbp~28 kDrosophila melanogaster5~122 Mbp13 379Homo sapiens24~ 3.3 Gbp~22.5 k
Slide3genome complexity
Slide4genome sizes
arabidopsis thaliana
psilotum nudum
genome size ~100 Mbp
genome size: ~ 250 Gbp
Slide5unregular genome sizes?
Schizosaccharomyces
pombe
fission yeast, genome smaller than many
bacterias
genome 12 462 637
bp
, 4 929 genes
Mimivirus
virus of an amoeba
genome 1 181 404
bp
, 1 262 genes
Tetraodon
nigroviridis
(
pufferfish
)
same number of genes as human, genome size only 1/10th
300
Mbp
, 27 918 genes
Slide6C-value
C-value refers to the amount of DNA contained within a haploid nucleus
in
picograms
among diploid organisms the terms C-value and genome size are used interchangeably
in
polyploids
the C-value may represent two or more genomes contained within the same nucleus
in animals C-value range more than 3,300x
genome size (
bp
) = (0.978 x 10
9
) x DNA content (pg)
DNA content (pg) = genome size (
bp
) / (0.978 x 10
9
)
1 pg = 978 Mb
Slide7genome sizes
0.0023 pg in the parasitic microsporidium Encephalitozoon intestinalis 1 400 pg in protist, the free-living amoeba Chaos chaos
Gregory T http://www.genomesize.com
Slide8Slide9C-value enigma
What types of non-coding DNA are found in different eukaryotic genomes, and in what proportions?
From where does this non-coding DNA come, and how is it spread and/or lost from genomes over time?
What effects, or perhaps even functions, does this non-coding DNA have for chromosomes, nuclei, cells, and organisms?
Why do some species exhibit remarkably streamlined chromosomes, while others possess massive amounts of non-coding DNA?
What is the minimal genome?
Slide10e-cell
model and reconstruct biological phenomena in silico
http://www.e-cell.org
Slide11Synthetic genomes
Mycoplasma
laboratorium
Gibson D, et al. (2008): Complete Chemical Synthesis, Assembly, and Cloning of a
Mycoplasma
genitalium
Genome. Science. DOI: 10.1126/science.1151721
Synthia
synthetic species of bacterium derived from the genome of
Mycoplasma
mycoides
from scratch and transplanted into a
Mycoplasma
capricolum
cell
Gibson D, et al. (2010): Creation of a bacterial cell controlled by a chemically synthesized genome. Science. DOI: 10.1126/science.1190719
Slide12just for fun – watermarks
"TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE."
"SEE THINGS NOT AS THEY ARE, BUT AS THEY MIGHT BE.""WHAT I CANNOT BUILD, I CANNOT UNDERSTAND."
P
A
C
E
S
VENTERINSTITVTE
CRAIGVENTER
HAMSMITH
CINDIANDCLYDE
GLASSANDCLYDE
Slide13Rhodobacter capsulatus, GC content
Slide14homo sapiens, gene distribution
Saccone S, et al. (2001) Chromosome Res.
Slide15structure of human genome
Up to date was read 3,164.7 billions nucleotides.
Average gene is 3 thousands nucleotides length, longest gene (
dystrophin
) is 2.4 billion nucleotides length.
Number of the genes is between 20k and 30k (23k)
Less than 2% of the genome code some protein.
Function of more than 50% of the genes is unknown.
DNA is more than 99,9% identical between all humans.
Repetitive elements, which does not code proteins ("junk DNA") compose more than 50% of the human genome.
Entropy rate is around 1.7 (.9 for Y chromosome).
Around 20% of our genome is transcribed.
Slide16importance of “junk” DNA
syncytin
(adapted ancestral
env
polyprotein
)
Blond JL (1999): Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family". J
Virol
social behavior in rodents (and possibly humans)
Hammock EA, Young LJ (2005): Microsatellite instability generates diversity in brain and
sociobehavioral
traits. Science
regulation of gene expression and promotion of genetic diversity
Peaston
A, et al (2004):
Retrotransposons
Regulate Host Genes in Mouse
Oocytes
and
Preimplantation
Embryos. Developmental Cell
evolution of sequences, for example, an antifreeze-protein gene in a species of fish
DeVries
AL and Cheng C-HC (2005): Antifreeze proteins in polar fishes. Fish Physiology
source of
microRNAs
Woolfe
A, et al (2005): Highly conserved non-coding sequences are associated with vertebrate development .
PLoS
Biol
LINE-1 capable of repairing broken strands of DNA.
Morrish
TA, et al (2002): DNA repair mediated by
endonuclease
-independent LINE-1
retrotransposition
. Nature Genetics
Slide17synthesizing non-natural parts from natural genomic template
Journal of Biological Engineering 2009, 3:2
doi:10.1186/1754-1611-3-2
Pawan
K Dhar1 , Chaw Su Thwin1 ,
Kyaw
Tun1 , Yuko Tsumoto1 , Sebastian Maurer-Stroh2 , Frank Eisenhaber2 and
Uttam
Surana3
The current knowledge of genes and proteins comes from 'naturally designed' coding and non-coding regions. It would be interesting to move beyond natural boundaries and make user-defined parts. To explore this possibility we made six non-natural proteins in E. coli. We also studied their potential tertiary structure and phenotypic outcomes.
The chosen
intergenic
sequences were amplified and expressed using
pBAD
202/D-TOPO vector. All six proteins showed significantly low similarity to the known proteins in the NCBI protein database. The protein expression was confirmed through Western blot. The endogenous expression of one of the proteins resulted in the cell growth inhibition. The growth inhibition was completely rescued by culturing cells in the inducer-free medium. Computational structure prediction suggests globular tertiary structure for two of the six non-natural proteins synthesized.
Slide18main events in genome evolution
mutations
(
SNP)
duplications
rearrangements
horizontal transfer
parasitic DNA
Slide19how and where to find transposones
Repbasedatabase of repetitive elementshttp://www.girinst.org/repbaseRepeatMaskersearch for repetitions in genome sequencehttp://www.repeatmasker.org
Slide20repetitive elements in human genome
Transposones: transposon-derived repeats, interspersed repeats45% of the genomeMicro a minisatellites: simple sequence repeats repetition of simple sort direct repeats3% of the genome Duplications: duplications of genome segments of different length (10 - 300 kb); inter and intra - chromosomal3.3% of the genomeOther types of repetitions: centromeric and telomeric repeats
IHGSC, Nature 2001
Slide21transposones in human (vertebrate) genome
DNA transposonesretrotransposones RNA as intermediate, reverse transcriptionLTR transposones (similar to retroviruses)polyA retrotransposones (colinear with mRNA, polyA)
human chromosome 21
Slide22DNA transposones
2-3 kbterminal reversed repetitions (50 - 100 bp)cut-and-paste mechanism3% of the genomeat least 7 classes, some of them not related
Slide23LTR retrotransposones
LTR – long terminal repeatHuman Endogenous Retroviruses (HERVs) RNA intermediate (RNA pol. II )short insertional duplications (4-6 bp)8 % of the genome100 000 elements, tens of families
Slide24LINE1 (L1) elements
LINE – long interspersed elementspoly A (non-LTR) retrotransposonsRNA intermediate (internal promotor for RNA pol. II)insertion duplication of different length (5-15 bp)insertion preferences (TT AAAA)17 % of genome500 000 elements, often cutted at 5' end 30-60 active LINE1 elements in genome
Slide25nonautonomous elements
They do not code enzymes for their own transposition.
For each class of the autonomous elements exists
nonautonomous
elements. Such elements use different mechanism of replication, specific for autonomous elements.
Slide26SINE (Alu) elements
SINE – short interspersed elementspoly A (non-LTR) retrotransposonsRNA intermediate (internal promotor for RNA pol. III)insertion duplications (5-15 bp)insertion preferences (TT | AAAA)10 % of genome1 000 000 elements, often cutted at 5' end
Slide27processed pseudogenes
colinear with mRNA
missing introns and promotores; poly A
often 5' cutted
bordered by direct repeats of different legth (4-15bp)
insertion sites are similar to LINE1 transposition
generated by L1
Slide28coevolution of “DNA parasites”
DNA transposones
LTR retrotransposones
polyA retrotransposones
Slide29HERV16 - example
http://hervd.img.cas.cz
Slide301000 Genome Project
current statusTrio project: two families with ~42x coverageYoruba and CaucasianLow-coverage project: ~5x coverage of unrelated individuals60 Yoruba, 60 Caucasians, 30 Han, 30 JapaneseExon project: 8000 exons (900 genes) by capture array, >50x coverage, 700 unrelated individuals+ 2 individual sequences (Watson and Venter)
1000GPC, Nature 2010
Slide31stability / fluidity of the genome
~200 to 300 loss-of-function variants in annotated genes and 50 – 100 variants of implicated inherited disorders10-8 per base per generation germline substitution rate
1000GPC, Nature 2010
Slide32ENCODE
Encyclopedia Of DNA Elements
Raney, NAR 2010
Slide33genome browsers
Golden Pathhttp://genome.ucsc.eduENSEMBLhttp://www.ensembl.org
Slide34Slide35Slide36Slide37that’s it, thank you
Institute of Molecular Genetics AS CR
Free and Open Bioinformatics Association