/
 Genome  structure and  evolution  Genome  structure and  evolution

Genome structure and evolution - PowerPoint Presentation

tawny-fly
tawny-fly . @tawny-fly
Follow
347 views
Uploaded On 2020-04-03

Genome structure and evolution - PPT Presentation

Jan Pačes Institute of Molecular Genetics AS CR sizes of selected completed genomes genome chromosomes size genes Mycoplasma genitalium 058 Mbp 521 Escherichia coli 46 Mbp 54 ID: 775219

genome dna elements genes genome dna elements genes proteins human coding size cell transposones structure natural insertion ltr mbp

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document " Genome structure and evolution" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Genome structure and evolution

Jan Pačes

Institute of Molecular Genetics AS CR

Slide2

sizes of selected completed genomes

genomechromosomessizegenesMycoplasma genitalium0.58 Mbp521Escherichia coli4.6 Mbp(5.4 Mbp)4 377(5 416)Saccharomyces cerevisiae1612.5 Mbp5 770Caenorhabtitis elegans6~100 Mbp19 427Arabidopsis thaliana5~115 Mbp~28 kDrosophila melanogaster5~122 Mbp13 379Homo sapiens24~ 3.3 Gbp~22.5 k

Slide3

genome complexity

Slide4

genome sizes

arabidopsis thaliana

psilotum nudum

genome size ~100 Mbp

genome size: ~ 250 Gbp

Slide5

unregular genome sizes?

Schizosaccharomyces

pombe

fission yeast, genome smaller than many

bacterias

genome 12 462 637

bp

, 4 929 genes

Mimivirus

virus of an amoeba

genome 1 181 404

bp

, 1 262 genes

Tetraodon

nigroviridis

(

pufferfish

)

same number of genes as human, genome size only 1/10th

300

Mbp

, 27 918 genes

Slide6

C-value

C-value refers to the amount of DNA contained within a haploid nucleus

in

picograms

among diploid organisms the terms C-value and genome size are used interchangeably

in

polyploids

the C-value may represent two or more genomes contained within the same nucleus

in animals C-value range more than 3,300x

genome size (

bp

) = (0.978 x 10

9

) x DNA content (pg)

DNA content (pg) = genome size (

bp

) / (0.978 x 10

9

)

1 pg = 978 Mb

Slide7

genome sizes

0.0023 pg in the parasitic microsporidium Encephalitozoon intestinalis 1 400 pg in protist, the free-living amoeba Chaos chaos

Gregory T http://www.genomesize.com

Slide8

Slide9

C-value enigma

What types of non-coding DNA are found in different eukaryotic genomes, and in what proportions?

From where does this non-coding DNA come, and how is it spread and/or lost from genomes over time?

What effects, or perhaps even functions, does this non-coding DNA have for chromosomes, nuclei, cells, and organisms?

Why do some species exhibit remarkably streamlined chromosomes, while others possess massive amounts of non-coding DNA?

What is the minimal genome?

Slide10

e-cell

model and reconstruct biological phenomena in silico

http://www.e-cell.org

Slide11

Synthetic genomes

Mycoplasma

laboratorium

Gibson D, et al. (2008): Complete Chemical Synthesis, Assembly, and Cloning of a

Mycoplasma

genitalium

Genome. Science. DOI: 10.1126/science.1151721

Synthia

synthetic species of bacterium derived from the genome of

Mycoplasma

mycoides

from scratch and transplanted into a

Mycoplasma

capricolum

cell

Gibson D, et al. (2010): Creation of a bacterial cell controlled by a chemically synthesized genome. Science. DOI: 10.1126/science.1190719

Slide12

just for fun – watermarks

"TO LIVE, TO ERR, TO FALL, TO TRIUMPH, TO RECREATE LIFE OUT OF LIFE."

"SEE THINGS NOT AS THEY ARE, BUT AS THEY MIGHT BE.""WHAT I CANNOT BUILD, I CANNOT UNDERSTAND."

P

A

C

E

S

VENTERINSTITVTE

CRAIGVENTER

HAMSMITH

CINDIANDCLYDE

GLASSANDCLYDE

Slide13

Rhodobacter capsulatus, GC content

Slide14

homo sapiens, gene distribution

Saccone S, et al. (2001) Chromosome Res.

Slide15

structure of human genome

Up to date was read 3,164.7 billions nucleotides.

Average gene is 3 thousands nucleotides length, longest gene (

dystrophin

) is 2.4 billion nucleotides length.

Number of the genes is between 20k and 30k (23k)

Less than 2% of the genome code some protein.

Function of more than 50% of the genes is unknown.

DNA is more than 99,9% identical between all humans.

Repetitive elements, which does not code proteins ("junk DNA") compose more than 50% of the human genome.

Entropy rate is around 1.7 (.9 for Y chromosome).

Around 20% of our genome is transcribed.

Slide16

importance of “junk” DNA

syncytin

(adapted ancestral

env

polyprotein

)

Blond JL (1999): Molecular characterization and placental expression of HERV-W, a new human endogenous retrovirus family". J

Virol

social behavior in rodents (and possibly humans)

Hammock EA, Young LJ (2005): Microsatellite instability generates diversity in brain and

sociobehavioral

traits. Science

regulation of gene expression and promotion of genetic diversity

Peaston

A, et al (2004):

Retrotransposons

Regulate Host Genes in Mouse

Oocytes

and

Preimplantation

Embryos. Developmental Cell

evolution of sequences, for example, an antifreeze-protein gene in a species of fish

DeVries

AL and Cheng C-HC (2005): Antifreeze proteins in polar fishes. Fish Physiology

source of

microRNAs

Woolfe

A, et al (2005): Highly conserved non-coding sequences are associated with vertebrate development .

PLoS

Biol

LINE-1 capable of repairing broken strands of DNA.

Morrish

TA, et al (2002): DNA repair mediated by

endonuclease

-independent LINE-1

retrotransposition

. Nature Genetics

Slide17

synthesizing non-natural parts from natural genomic template

Journal of Biological Engineering 2009, 3:2

doi:10.1186/1754-1611-3-2

Pawan

K Dhar1 , Chaw Su Thwin1 ,

Kyaw

Tun1 , Yuko Tsumoto1 , Sebastian Maurer-Stroh2 , Frank Eisenhaber2 and

Uttam

Surana3

The current knowledge of genes and proteins comes from 'naturally designed' coding and non-coding regions. It would be interesting to move beyond natural boundaries and make user-defined parts. To explore this possibility we made six non-natural proteins in E. coli. We also studied their potential tertiary structure and phenotypic outcomes.

The chosen

intergenic

sequences were amplified and expressed using

pBAD

202/D-TOPO vector. All six proteins showed significantly low similarity to the known proteins in the NCBI protein database. The protein expression was confirmed through Western blot. The endogenous expression of one of the proteins resulted in the cell growth inhibition. The growth inhibition was completely rescued by culturing cells in the inducer-free medium. Computational structure prediction suggests globular tertiary structure for two of the six non-natural proteins synthesized.

Slide18

main events in genome evolution

mutations

(

SNP)

duplications

rearrangements

horizontal transfer

parasitic DNA

Slide19

how and where to find transposones

Repbasedatabase of repetitive elementshttp://www.girinst.org/repbaseRepeatMaskersearch for repetitions in genome sequencehttp://www.repeatmasker.org

Slide20

repetitive elements in human genome

Transposones: transposon-derived repeats, interspersed repeats45% of the genomeMicro a minisatellites: simple sequence repeats repetition of simple sort direct repeats3% of the genome Duplications: duplications of genome segments of different length (10 - 300 kb); inter and intra - chromosomal3.3% of the genomeOther types of repetitions: centromeric and telomeric repeats

IHGSC, Nature 2001

Slide21

transposones in human (vertebrate) genome

DNA transposonesretrotransposones RNA as intermediate, reverse transcriptionLTR transposones (similar to retroviruses)polyA retrotransposones (colinear with mRNA, polyA)

human chromosome 21

Slide22

DNA transposones

2-3 kbterminal reversed repetitions (50 - 100 bp)cut-and-paste mechanism3% of the genomeat least 7 classes, some of them not related

Slide23

LTR retrotransposones

LTR – long terminal repeatHuman Endogenous Retroviruses (HERVs) RNA intermediate (RNA pol. II )short insertional duplications (4-6 bp)8 % of the genome100 000 elements, tens of families

Slide24

LINE1 (L1) elements

LINE – long interspersed elementspoly A (non-LTR) retrotransposonsRNA intermediate (internal promotor for RNA pol. II)insertion duplication of different length (5-15 bp)insertion preferences (TT AAAA)17 % of genome500 000 elements, often cutted at 5' end 30-60 active LINE1 elements in genome

Slide25

nonautonomous elements

They do not code enzymes for their own transposition.

For each class of the autonomous elements exists

nonautonomous

elements. Such elements use different mechanism of replication, specific for autonomous elements.

Slide26

SINE (Alu) elements

SINE – short interspersed elementspoly A (non-LTR) retrotransposonsRNA intermediate (internal promotor for RNA pol. III)insertion duplications (5-15 bp)insertion preferences (TT | AAAA)10 % of genome1 000 000 elements, often cutted at 5' end

Slide27

processed pseudogenes

colinear with mRNA

missing introns and promotores; poly A

often 5' cutted

bordered by direct repeats of different legth (4-15bp)

insertion sites are similar to LINE1 transposition

generated by L1

Slide28

coevolution of “DNA parasites”

DNA transposones

LTR retrotransposones

polyA retrotransposones

Slide29

HERV16 - example

http://hervd.img.cas.cz

Slide30

1000 Genome Project

current statusTrio project: two families with ~42x coverageYoruba and CaucasianLow-coverage project: ~5x coverage of unrelated individuals60 Yoruba, 60 Caucasians, 30 Han, 30 JapaneseExon project: 8000 exons (900 genes) by capture array, >50x coverage, 700 unrelated individuals+ 2 individual sequences (Watson and Venter)

1000GPC, Nature 2010

Slide31

stability / fluidity of the genome

~200 to 300 loss-of-function variants in annotated genes and 50 – 100 variants of implicated inherited disorders10-8 per base per generation germline substitution rate

1000GPC, Nature 2010

Slide32

ENCODE

Encyclopedia Of DNA Elements

Raney, NAR 2010

Slide33

genome browsers

Golden Pathhttp://genome.ucsc.eduENSEMBLhttp://www.ensembl.org

Slide34

Slide35

Slide36

Slide37

that’s it, thank you

Institute of Molecular Genetics AS CR

Free and Open Bioinformatics Association