Gregg Thomas Indiana University greggwcthomas Arthropod Genomics Symposium 060917 Arthropods are the largest group of multicellular organisms 2 70 Arthropods are the largest group of multicellular organisms ID: 805996
Download The PPT/PDF document "Evolution of the genes and genomes of 76..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Evolution of the genes and genomes of 76 arthropod species
Gregg Thomas
Indiana University @greggwcthomasArthropod Genomics Symposium, 06.09.17
Slide2Arthropods are the largest group of multicellular organisms
2 / 70
Slide3Arthropods are the largest group of multicellular organisms
https://fivethirtyeight.com/features/the-bugs-of-the-world-could-squish-us-all
/ 3 / 70
Slide4Arthropods exhibit great phenotypic and behavioral diversity
4
/ 70
Slide5https://
www.hgsc.bcm.edu/arthropods/i5k
The i5k pilot project has sequenced 27 insect genomes 5 / 70
Slide627
genomes sequenced as part of the i5k pilot
project
6
/ 70
Slide727
genomes sequenced as part of the i5k pilot
project49 previously sequenced arthropod genomes+7 / 70
Slide8(4 species)
(2 species)
(7 species)(24 species)(6 species)(5 species)(14 species)+27 genomes sequenced as part of the i5k pilot project49 previously sequenced arthropod genomesSpanning 21 arthropod orders
8 / 70
Slide9Questions
What are the relationships among species and orders?What are the patterns of genome evolution?
What did the genome of the last insect common ancestor (LICA) look like?9 / 70
Slide10Questions
What are the relationships among species and orders?What are the patterns of genome evolution?
What did the genome of the last insect common ancestor (LICA) look like?10 / 70
Slide11Orthology
prediction from OrthoDB
Rob Waterhouse38,195 ortho-groups across 76 arthropod species11 / 70
Slide12Orthologs for
phylogeneticsWe rely on single-copy orthologs for species tree reconstruction to minimize gene tree discordance due to duplications and losses
How many are in our data?12 / 70
Slide13Orthologs for
phylogeneticsWe rely on single-copy orthologs for species tree reconstruction to minimize gene tree discordance due to duplications and losses
How many are in our data?013 / 70
Slide14EOG8DFS3J
Single-copy in all but one species (2 copies in Plutella
xylostella)14 / 70
Slide15EOG8DFS3J
Single-copy in all but one species
(2 copies in Plutella xylostella)Problem: Hemiptera not monophyleticProblem: Lepidoptera and Trichoptera within
Diptera
15
/ 70
Slide16As the number of species increases, the number of sequenced single-copy genes decreases
16 / 70
Slide17As the number of species increases, the number of sequenced single-copy genes decreases
How can we turn our
species rich data into sequence rich data?17 / 70
Slide18As the number of species increases, the number of sequenced single-copy genes decreases
How can we turn our
species rich data into sequence rich data?
Construct a backbone phylogeny by using single-copy
orthologs
among
orders
rather than
species
18
/ 70
Slide19Backbone Phylogeny Construction
Phylum
# Orders# single-copy orthologsArthropoda2115019 / 70
Slide20Backbone Phylogeny Construction
2 alignment methods
Maximum Likelihood gene trees3 species tree methodsMUSCLEPASTAConsensusConcatenationCoalescentPhylum
# Orders
# single-copy
orthologs
Arthropoda
21
150
20
/ 70
Slide21(4 species)
(2 species)
(7 species)(24 species)(6 species)(5 species)(14 species)The backbone phylogeny based on:-150 genes-Pasta alignment-ASTRAL21 / 70
Slide22(4 species)
(2 species)
(7 species)(24 species)(6 species)(5 species)(14 species)Monophyletic Crustacea??The backbone phylogeny based on:-150 genes-Pasta alignment
-ASTRAL
22
/ 70
Slide23Monophyletic Crustacea
?Our inferred topology…
23 / 70
Slide24Monophyletic Crustacea
?
Our inferred topology……differs from other inferred topologies24 / 70
Slide25Slide26We observe 4 of the 15 possible
pancrustacea
topologies
Slide27We observe 4 of the 15 possible
pancrustacea
topologies
What happens if we decrease the number of species?
Slide28Slide29And if we decrease the number of species again?
Slide30Slide31Monophyletic crustacea
?Even with more genes, methods still disagree on the correct topologyOf the 15 possible topologies, we recovered 8
Half the methods support a monophyletic crustacea31 / 70
Slide32Monophyletic crustacea
?Even with more genes, methods still disagree on the correct topologyOf the 15 possible topologies, we recovered 8
Half the methods support a monophyletic crustacea
Slide33Multi-species order phylogeny construction
2 alignment methods
Maximum Likelihood gene trees3 species tree methodsMUSCLEPASTAConsensusConcatenationCoalescentOrder
# Species
# single-copy
orthologs
Araneae
4
1627
Hemiptera
7
2053
Hymenoptera
24
2121
Coleoptera
6
3880
Lepidoptera
5
3660
Diptera
14
1324
Slide34Araneae
: 1627 genesHemiptera: 2053 genes
Hymenoptera: 2121 genesColeoptera: 3880 genesLepidoptera: 3660 genesDiptera: 1324 genesThe Arthropod phylogeny
Slide35All methods agree
Araneae
: 1627 genesHemiptera: 2053 genesHymenoptera: 2121 genesColeoptera
: 3880 genesLepidoptera: 3660 genes
Diptera
: 1324 genes
Slide36Disagreement between methods
Araneae
: 1627 genesHemiptera: 2053 genesHymenoptera: 2121 genesColeoptera: 3880 genesLepidoptera: 3660 genesDiptera: 1324 genes
Slide37Questions
What are the relationships among species and orders?What are the patterns of genome evolution?
What did the genome of the last insect common ancestor (LICA) look like?37 / 70
Slide38What are the rates of evolution?
Amino acid substitution rates
Gene gain/loss rates
Slide39Evolutionary rates require a time tree
Used a non-parametric method to smooth the treeUsed several fossil calibrations from
Misof et al.39 / 70
Slide40LICA
350
myaArthropod Time TreeHolometabola311 mya40 / 70
Slide41Substitution rates per site per year
41
/ 70
Slide42Gene duplications can lead to important functional evolution
42
/ 70
Slide43Tips: observed variables
xi: hidden variables
1
3
1
1
1
0
0
Gene family analysis: Example
43
/ 70
Slide44Tips: observed variables
xi: hidden variablesOur goal is to infer the states of the internal nodes of the tree
1
3
1
1
1
0
0
Gene family analysis: Example
1
1
1
1
0
44
/ 70
Slide45Tips: observed variables
x
i: hidden variablesOur goal is to infer the states of the internal nodes of the treeThen we can count changes along each lineage
1
3
1
1
1
0
0
Gene family analysis: Example
1
1
1
1
0
+2
-1
45
/ 70
Slide46Genes gained/lost per year
46
/ 70
Slide47Genes gained/lost per year
Substitutions per site per year
No correlation between gain/loss rates and substitution rates
Slide48Rapidly evolving gene families
# of rapidly evolving families
48 / 70
Slide49# of rapidly evolving families
Rapidly evolving gene families
Several families related to venom and silk production rapidly expanding among spidersJessica Garb49 / 70
Slide50Rapidly evolving gene families
German cockroach has highest number of rapidly evolving families, despite low gene gain/loss rate
EOG8D294J rapidly evolving only in Blatella germanicaGained 34 genesresponse to light stimuluslocomotor rhythm# of rapidly evolving families50 / 70
Slide51# of rapidly evolving families
Rapidly evolving gene families
German cockroach has highest number of rapidly evolving families, despite low gene gain/loss rateEOG8D294J rapidly evolving only in Blatella germanicaGained 34 genesresponse to light stimuluslocomotor rhythm51 / 70
Slide52Families present only in a single species
Spikes occur in both species with low quality AND highly annotated genomes
Tip-specific gene families# of tip specific families52 / 70
Slide53Families that are found only in that order AND in every species in that order
Order-specific gene families
# of order specific families53 / 70
Slide54# of order specific families
Families that are found only in that order AND in every species in that order
Order-specific gene familiesLarge number of Lepidoptera specific families5 families with odorant/olfactory functions3 families involved in response to stress54 / 70
Slide55Questions
What are the relationships among species and orders?What are the patterns of genome evolution?
What did the genome of the last insect common ancestor (LICA) look like?55 / 70
Slide56What does the ancestral insect look like?
56
/ 70
Slide57What does the ancestral insect look like?
How can we infer characteristics about the genome of the last insect common ancestor (LICA)?
57 / 70
Slide580
2
2212
4
2
2
0
2
2
0
5
2
3
1
1
1
0
1
x
17
x
16
x
18
x
19
x
15
x
13
x
12
x
14
x
11
x
9
x
10
x
8
x
6
x
5
x
7
x
4
x
3
x
2
x
1
LICA
How can we infer characteristics of the genome of LICA?
58
/ 70
Slide590
2
2212
4
2
2
0
2
2
0
5
2
3
1
1
1
0
1
x
17
x
16
x
18
x
19
x
15
x
13
x
12
x
14
x
11
x
9
x
10
x
8
x
6
x
5
x
7
x
4
x
3
x
2
x
1
LICA
How can we infer characteristics of the genome of LICA?
How many genes were present in the LICA genome?
59
/ 70
Slide600
2
22124
2
2
0
2
2
0
5
2
3
1
1
1
0
1
x
17
x
16
x
18
x
19
x
15
x
13
x
12
x
14
x
11
x
9
x
10
x
8
x
6
x
5
x
7
x
4
x
3
x
2
x
1
LICA
How many genes were present in the LICA genome?
How can we infer characteristics of the genome of LICA?
9,601 genes
60
/ 70
Slide61Estimates of ancestral genome size are biased because of extinct gene families
LICA
61 / 70
Slide62Corrected # of genes in the LICA genome:
14,615
# of genes LICALICA62 / 70
Slide630
0
0000
0
0
0
0
2
2
0
5
2
3
1
1
1
0
1
0
0
0
0
0
0
0
0
0
2
1
1
2
2
2
1
1
1
1
+1
How can we infer characteristics of the genome of LICA?
Which families were ‘born’ during the transition to insects?
63
/ 70
Slide640
0
0000
0
0
0
0
2
2
0
5
2
3
1
1
1
0
1
0
0
0
0
0
0
0
0
0
2
1
1
2
2
2
1
1
1
1
+1
147 novel insect families
Which families were ‘born’ during the transition to insects?
How can we infer characteristics of the genome of LICA?
64
/ 70
Slide65Novel insect families correspond to insect lifestyle adaptations
7 chitin and cuticle production families
Changes in exoskeleton development65 / 70
Slide66Novel insect families correspond to insect lifestyle adaptations
7 chitin and cuticle production families
1 visual learning and behavior family2 odorant binding families 5 families involved in neural activityChanges in exoskeleton developmentAbility to sense in a terrestrial environment66 / 70
Slide67Novel insect families correspond to insect lifestyle adaptations
7 chitin and cuticle production families
1 visual learning and behavior family2 odorant binding families 5 families involved in neural activity1 larval behavior family 4 imaginal disk development familiesChanges in exoskeleton developmentAbility to sense in a terrestrial environmentUnique development67 / 70
Slide68Novel insect families correspond to insect lifestyle adaptations
7 chitin and cuticle production families
1 visual learning and behavior family2 odorant binding families 5 families involved in neural activity1 larval behavior family 4 imaginal disk development families3 wing morphogenesis familiesChanges in exoskeleton developmentAbility to sense in a terrestrial environmentUnique developmentFlight68 / 70
Slide69All data has been made available in our online
toolhttps://cgi.soic.indiana.edu/~grthomas/i5k/i5k_phylo.html
69 / 70
Slide70All data has been made available in our online
toolhttps://cgi.soic.indiana.edu/~grthomas/i5k/i5k_phylo.html
70 / 70
Slide71Acknowledgments
Matthew HahnStephen RichardsRob WaterhouseJessica GarbElias Dohmen
Ariel ChipmanGene family website: https://cgi.soic.indiana.edu/~grthomas/i5k/i5k_phylo.html i5k website: http://i5k.github.io/
The i5k community
The Hahn lab + Clara Boothby