/
(b)ADMIXTURE Aritra Bose (b)ADMIXTURE Aritra Bose

(b)ADMIXTURE Aritra Bose - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
356 views
Uploaded On 2019-12-07

(b)ADMIXTURE Aritra Bose - PPT Presentation

bADMIXTURE Aritra Bose FEBRUARY 2019 A Auton et al Nature 526 6874 2015 doi101038nature15393 Population Structure Mutation Mutations are changes to the base pair sequence of the DNA ID: 769450

palettes admixture populations badmixture admixture palettes badmixture populations snps painting structure chromopainter individuals ancestral inferred ancestry genome haplotype genetic

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "(b)ADMIXTURE Aritra Bose" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

(b)ADMIXTURE Aritra Bose FEBRUARY, 2019

A Auton et al . Nature 526 , 68-74 (2015) doi:10.1038/nature15393 Population Structure

Mutation Mutations are changes to the base pair sequence of the DNA. Natural selection Genotypes that correspond to favorable traits and are heritable become more common in successive generations of a population of reproducing organisms.  Mutations increase genetic diversity. Under natural selection, beneficial mutations increase in frequency, and vice versa. Slides from Prof. Petros Drineas Causes of genetic variation

Doodles: Khan Academy

Single Nucleotide Polymorphisms (SNPs) S ingle N ucleotide P olymorphisms : the most common type of genetic variation in the genome across different individuals. They are known locations at the human genome where two alternate nucleotide bases (alleles) are observed (out of A, C, G, T). SNPs individuals … AG CT GT GG CT CC CC CC CC AG AG AG AG AG AA CT AA GG GG CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AG GG GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA CT AA GG GG CC GG AA GG AA CC AA CC AA GG TT AA TT GG GG GG TT TT CC GG TT GG GG TT GG AA …… GG TT TT GG TT CC CC CC CC GG AA AG AG AA AG CT AA GG GG CC AG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AG GG GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA CC GG AA CC CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT AG GT GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA GG GG GG AA CT AA GG GG CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT AG GG TT GG AA …… GG TT TT GG TT CC CC CG CC AG AG AG AG AG AA CT AA GG GG CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT AG GG TT GG AA …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA TT AA GG GG CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT GG GT TT GG AA … We have ~10 million SNPs in the human genome, so this matrix could have ~ 10 million columns Slides from Prof. Petros Drineas

… AG CT GT GG CT CC CC CC CC AG AG AG AG AG AA CT AA GG GG CC GG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AG GG GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA CT AA GG GG CC GG AA GG AA CC AA CC AA GG TT AA TT GG GG GG TT TT CC GG TT GG GG TT GG AA …… GG TT TT GG TT CC CC CC CC GG AA AG AG AA AG CT AA GG GG CC AG AG CG AC CC AA CC AA GG TT AG CT CG CG CG AT CT CT AG CT AG GG GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA CC GG AA CC CC AG GG CC AC CC AA CG AA GG TT AG CT CG CG CG AT CT CT AG CT AG GT GT GA AG …… GG TT TT GG TT CC CC CC CC GG AA GG GG GG AA CT AA GG GG CT GG AA CC AC CG AA CC AA GG TT GG CC CG CG CG AT CT CT AG CT AG GG TT GG AA …… GG TT TT GG TT CC CC CG CC AG AG AG AG AG AA CT AA GG GG CT GG AG CC CC CG AA CC AA GT TT AG CT CG CG CG AT CT CT AG CT AG GG TT GG AA …… GG TT TT GG TT CC CC CC CC GG AA AG AG AG AA TT AA GG GG CC AG AG CG AA CC AA CG AA GG TT AA TT GG GG GG TT TT CC GG TT GG GT TT GG AA … 0 0 0 1 0 -1 1 1 1 0 0 0 0 0 1 0 1 -1 -1 1 -1 0 0 0 1 1 1 1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 -1 -1 -1 1 -1 -1 1 1 1 -1 1 0 0 0 1 0 1 -1 -1 1 -1 1 -1 1 1 1 1 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 1 -1 -1 -1 1 -1 -1 1 1 1 -1 1 0 0 1 0 0 1 -1 -1 1 0 0 0 0 1 1 1 1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 -1 -1 -1 1 -1 -1 1 1 1 -1 1 0 0 0 1 1 -1 1 1 1 0 -1 1 0 1 1 0 1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 1 -1 -1 1 1 1 -1 1 -1 -1 -1 1 0 1 -1 -1 0 -1 1 1 0 0 1 1 1 -1 -1 -1 1 0 0 0 0 0 0 0 0 0 1 -1 -1 1 -1 -1 -1 1 -1 -1 1 0 1 0 0 0 0 0 1 0 1 -1 -1 0 -1 0 1 -1 0 1 1 1 -1 -1 0 0 0 0 0 0 0 0 0 0 0 1 -1 -1 1 -1 -1 -1 1 -1 -1 1 1 1 -1 1 0 0 0 1 -1 1 -1 -1 1 0 0 0 1 1 1 0 1 -1 -1 1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 0 -1 -1 1example: ΑΑ = 1 ΑG = 0 GG = -1Genotype data as a matrixSlides from Prof. Petros Drineas individualsSNPsindividualsSNPs where is the number of individuals and is the number of SNPs.  

Out of Africa Henn et al. 2012, PNAS

Founder Effect Population Bottleneck Special Case Pictures: Khan Academy

Worldwide substructure Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation Li et al. 2008, Science

G1 Ancestral populations G2 1 2 Gene flow between populations In subsequent generations segments become shorter What is admixture?

STRUCTURE or ADMIXTURE m individuals n SNPs m individuals K ancestral components

Our objective is to compute the shared ancestry between populations ‘X’ and ‘Y’. We create two matrices, Px , PY of dimensions, m-by-K and n-by-K containing the estimates from ADMIXTURE and project P Y onto the subspace of P X . We take the top ‘k’ eigenvectors of PX and create a matrix, Vx . We perform the projection to find the shared ancestry between ‘X’ and ‘Y’ using the following: Shared Ancestry = Meta-analysis of ADMIXTURE

Germany Hungary France Italy Sicily Spain Peloponnese Serbian Cappadocia Crete Ukraine Poland Belarus Tuscany Basque Iberia Kurds Maronites Lebanon Syria Pontus Peloponessean Greeks Slavs Near Easterners

Belarusians Russians Polish Ukrainians French Italians Basque Andalusians Argolis 5.4 (1.5) 12.2 (1.2) 5.8 (0.8) 6.8 (1.1) 39.1 (19.2) 94.7 (4.8) 2.8 (1.4)60.5 (5.9)Corinthia5.9 (1.7) 13.0 (1.3)6.3 (1) 7.5 (1.3) 41.2 (18.5) 94.9 (4.0) 3.1 (1.7) 62.0 (5.9) Achaea 6.5 (1.7) 13.8 (1.1) 7.0 (0.8) 8.1 (1.1) 41.4 (18.4) 94.8 (4.0) 2.7 (1.4) 61.3 (5.8) Arcadia 5.3 (1.8) 10.9 (2.4) 5.2 (1.2) 6.2 (1.5) 39.1 (18.2)85.4 (14.6)2.4 (1.4)53.8 (9.1)Elis6.1 (1.3)13.1 (1.2)6.5 (0.8)7.6 (1.1)41.4 (18.3)95.0 (3.3)3.3 (1.7)61.6 (5.6)Messenia6.7 (1.7)14.4 (1.2)7.3 (0.9)8.5 (1.2)42.6 (18.4)95.2 (4.0)2.7 (1.3)61.8 (5.7)Laconia4.8 (1.2)11.4 (1.5)5.2 (0.9)6.4 (1.1)41.1 (14.6)96.1 (2.3)2.3 (1.4)59.8 (5.6)Shared AncestryShared ancestry between each pair of populations for K = 4:8. Standard deviation is mentioned in the bracket. Stamatoyannopoulos, Bose et al., EJHG (2017)

Is ADMIXTURE bad?  If these assumptions are not validated, there is substantial danger of over-interpretation. 

Ancient human genomes suggest three ancestral populations for present-day Europeans, Nature (2014) The curious case of K

Indistinguishable ADMIXTURE (a) Simplified schematic of each simulation scenario; (b) Inferred ADMIXTURE plots at K=11; (c) CHROMOPAINTER inferred palettes

Trans-Atlantic Slave Trade From: Atlas of the Transatlantic Slave Trade, Eltis and Richardson, based on www.slavevoyages.org

Indistinguishable ADMIXTURE (a) Simplified schematic of each simulation scenario; (b) Inferred ADMIXTURE plots at K=11; (c) CHROMOPAINTER inferred palettes

CHROMOPAINTER Shown to have high sensitivity for detecting subtle, fine-scale population structure.

CHROMOPAINTER has the advantage of using linkage information If a reference genome is available for the study organism, it is possible to know the position of each of the genotyped SNPs.E.g. AGCTCCC GTAACG AGCTCCGGTAACGThis provides additional information that can be used in CHROMOPAINTER to improve the accuracy of inferences. x

The Linkage model In the absence of recombination, entire chromosomes are inherited as single genetic blocks.Recombination breaks up ancestral relationships between regions of the chromosome. The likelihood of a recombination event between two SNPs increases with the physical distance between them, so SNPs that are further apart are more likely to be inherited independently. x x x x x x x x x x x x xx x x x x x x x x x x x x x x x x x x x x x

Figure 1. Illustration of the painting process to create the coancestry matrix.We show the process by which a haplotype (haplotype 1, black) is painted using the others. A) True underlying genealogies for eight simulated sequences at three locations along a genomic segment, produced using the program ‘ ms’  and showing coalescence times between haplotypes at each position. B) The Time to the Most Recent Common Ancestor (TMRCA) between haplotype 1 and each other haplotype, as a function of sequence position. Note multiple haplotypes can share the same TMRCA and changes in TMRCA correspond to historical recombination sites. C) True distribution of the ‘nearest neighbour’ haplotype. D) Sample ‘paintings’ of the Li & Stephens algorithm. E) Expectation of the painting process, estimating the nearest neighbour distribution. F) Resulting row of the coancestry matrix, based on the expectation of the painting.

Painting Palettes Plot: Paintmychrosomes.org

How to group by palettes? Each individual starts with their own palette and similar palettes are progressively merged until the remaining palettes are sufficiently distinct to be statistically differentiated from each other. A and B start with very similar palettes and merges in the first iteration. Similarly, C and D merges in the first 3 iterations. Therefore, this dataset has three distinct populations. fineSTRUCTURE is an MCMC algorithm. It gives an appropriate indication of statistical uncertainty of clustering, with enough iterations. fineSTRUCTURE

(d) Painting residuals after fitting optimal ancestral palettes using badMIXTURE , on the residual scale shown. (e) Ancestral palettes estimated by badMIXTURE. 13 populations in total were simulated, with grey populations all being outgroups to those shown in colour Residuals are the difference between observed palettes for each individuals in the simulated data and those obtained by badMIXTURE. badMIXTURE

badMIXTURE Interpretation No pattern in residuals for “Recent Admixture”. “Ghost Admixture” and “Recent Bottleneck” scenarios show a distinctive pattern in the residuals. Specifically for P1 and P2. In (e) “Ghost Admixture” results in a more uniform palette as the other models contain bottlenecks. In “Recent Admixture”, the probability of two members of P2 sharing the same ancestry source is less than that of it’s sharing with P1. badMIXTURE correctly distinguishes that.

Is badMIXTURE any good?

Ari in Ethiopia (a), (b) & (c) ADMIXTURE (K=11) of Ari and neighboring Ethiopian groups Afar and Somali from the three studies, respectively. CHROMOPAINTER inferred painting palettes. Origins of ARI Blacksmiths, an occupational caste-like group in Ethiopian Ari societies. Two hypotheses exist: Traditionally explained as either remnants of HG groups assimilated by expansion of Agricultural communities in the Neolithic period OR a group marginalized in agricultural communities due to their craft skills.

bad(good)MIXTURE Van Dorp et al did other analyses beyond ADMIXTURE using fineSTRUCTURE to show that the other two prior works concluded a FALSE history. Found evidence of scenario analogous to “Recent Bottleneck”. Blacksmiths and Cultivators diverged from each other, principally by a bottleneck in the Blacksmiths, which was likely a consequence of their marginalized status. Accounting for this, they have similar ancestry profiles and history in (f).

Good riddance due to badMIXTURE STRUCTURE and ADMIXTURE are popular because they give a broad-brush view of variation. For more information, one can zoom into target populations. History is not SIMPLE in most cases as assumed in an ADMIXTURE model. Naïve inferences based on this model can be misleading if K is inappropriate or the number of samples is not enough. Unobserved ancient structure, bottleneck and genetic drift scenarios has to be accounted for. badMIXTURE is an R package which can be used to asses the model fit of STRUCTURE/ADMIXTURE results for each individual. badMIXTURE provides a validation framework to the ubiquitously used model-based method in population genetics.