Wellcome Trust Advanced Courses Genomic Epidemiology in Africa 21 st 26 th June 2015 Africa Centre for Health and Population Studies University of KwaZuluNatal Durban South Africa ID: 933400
Download Presentation The PPT/PDF document "Population genetics Dr Gavin Band" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
Population genetics
Dr Gavin Band
Wellcome
Trust Advanced Courses; Genomic Epidemiology in Africa
,
21
st
– 26
th
June 2015
Africa
Centre for Health and Population Studies, University of KwaZulu-Natal, Durban, South Africa
Slide2Introductions
meta-analysis and power of genetic
studies
Genetics
GWAS results and interpretation
GWAS QC
Basic principles of measuring disease in populations
Principal components analyses
Basic genotype data summaries and analyses
GWAS association analyses
Bioinformatics
Public databases and resources for genetics
whole genome sequencing and fine-mapping
Epidemiology
population genetics
Slide3Let’s imagine we’ve collected
and
sequenced
some
samples...
ATAGAAAGACCAGACTCCATCGCTAGCAGCTACGCTAGAGTTA
ATTGAAAGACCATACTCCATCGCTAGCAGC-ACGCTAGAGTTA
ATAGAAAGACCAGACTCCATCGCAAGCAGC-ACCCTAGCGTTA
ATAGAAAGACCAGACTCCATCGCAAGCAGCTACGCTAGAGTTA
.
.
.
K
samples
ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA
Slide4Let’s imagine we’ve collected and sequenced some samples...
AT
A
GA
A
AGACCA
G
ACT
C
CATCGC
T
AGCAGC
T
AC
G
CTAG
A
GTTA
AT
T
GA
A
AGACCA
T
ACT
C
CATCGCT
AGCAGC-ACGCTAG
AGTTA
AT
A
GA
A
AGACCAGACTCCATCGC
AAGCAGC-AC
CCTAGCGTTA
AT
A
GA
A
AGACCAGACT
C
CATCGC
A
AGCAGC
T
AC
G
CTAG
A
GTTA
SNPs
Insertion / deletion polymorphism
ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA
Slide5Let’s imagine we’ve collected
and
sequenced some
samples...
AT
A
GA
A
AGACCA
G
ACT
C
CATCGC
T
AGCAGC
T
AC
G
CTAG
A
GTTA
AT
T
GA
A
AGACCA
TACTC
CATCGCTAGCAGC-ACG
CTAGAGTTA
AT
A
GA
A
AGACCAGACT
CCATCGCAAGCAGC-
ACCCTAGC
GTTA
AT
A
GA
AAGACCAG
ACT
C
CATCGC
A
AGCAGC
T
AC
G
CTAG
A
GTTA
ATAGATAGACCATACTGCATCGCAAGCAGCTACGCTAGCGTTA
Slide6Yoruba from Ibadan, Nigeria
Utah residents, ancestrally Northern and Western European
24 haplotypes (12 individuals) 100 SNPs on chromosome 20
Slide7What should we expect to observe?
How can we interpret observed patterns?
What processes generated this data?
Key questions
Slide8Key
ancestral
processes
Genetic drift
Mutation
Recombination
(and selection)
Slide9A simple model of a population
Present
Past
G generations
2N
chromosomes
Slide10Present
Past
G generations
A simple model of a population
Slide11Present
Past
G generations
A simple model of a population
Slide12Present
Past
G generations
A simple model of a population
Slide13Present
Past
G generations
A simple model of a population
Slide14Genetic drift
Slide15Present
Past
G generations
Genetic drift
π
=1.49
π
=0.35
(mean number of pairwise differences)
Genetic drift reduces diversity
(it makes everyone look the same)
Slide16Present
Past
G generations
r
2
=0.33
Between and
r
2
=0.51
Between and
Genetic drift
Genetic drift creates correlations between alleles
(
it increases LD)
Slide17Present
Past
G generations
p(1-p)
=0.24
Genetic drift
Genetic drift decreases heterozygosity
p(1-p)
=0.16
Slide18Size matters
Approximate variance in allele frequency after s generations
K=100
50 generations
- Genetic drift acts faster.
E.g
:
In a smaller population:
Slide19Size matters- Genetic drift acts faster.
E.g:- There is more relatedness. E.g
:
Approximate variance in allele frequency after s generations
2N
The expected
time to the
most recent common ancestor
of two samplesIn a smaller population:
1/2N
Probability two samples coalesce (i.e. have the same parent) in the previous generation
Slide20Example: a bottleneck
Slide21Yoruba from Ibadan, Nigeria
Utah residents, ancestrally Northern and Western European
24 haplotypes (12 individuals) 100 SNPs on chromosome 20
Slide22Genetic driftsummary
Genetic drift decreases diversity by causing haplotypes to fluctuate in frequency, so that alleles are lost and everyone starts looking the same. This creates correlations between alleles along chromosomes (i.e. it creates LD).
Genetic drift acts faster in smaller populations. In the same way, individuals in smaller populations tend to be more closely related.Simple population genetic models are definitely wrong, but still useful in understanding genetic variation.
Slide23An acknowledgementTo make these slides I’ve used modified version of code originally written by Graham Coop. I’ll make this code available on the course materials site, but the original code is here:
https://github.com/cooplab/popgen-notes/Graham’s group website www.gcbias.org
is also a good place to look for information on population genetics topics.
Slide24Ancestral processes
Mutation
Recombination
Coalesce
2
μ
2r 1/2N
If
only drift were operating, we’d all look identical to each other. Something must be acting against drift.
Present
Past
G generations
Mutation
2N
chromosomes
Genetic drift means most mutations that arise are lost.
Some survive and contribute to genetic variation in the population
Slide26Ancestral processes
Mutation
Recombination
Coalesce
2
μ
2r 1/2N
If
only drift were operating, we’d all look identical to each other. Something must be acting against drift.
Paternal (father)
Recombination
Maternal (mother)
No recombination
Recombination
Slide28.
.
.
Recombination breaks down the correlation between alleles
.
.
.
Recombination
Slide29Recombination
in humans has a complex, interesting structure
Slide30Recombination clusters along chromosomes
Studies have shown that recombination is not uniform along chromosomes
centiMorgans
per Mb
Slide31Hotspots can break down
correlations over short distances
Hotspots and haplotypes
Slide32Hotspots and haplotypes
Recombination hotspots lead to regions of strong correlation separated by regions of low LD
Recombination rate
Slide33Measuring correlations
In genetics correlation between alleles is called linkage disequilibrium (LD)There are several measures of LDUnderstanding LD in natural populations is important for genomic epidemiology
Slide34AB
Ab
aB
A
a
B
b
ab
Linkage equilibrium
Here, haplotype
frequencies are determined by SNP allele frequencies (they are in equilibrium
).
f
AB
=
f
A
f
B
Slide35AB
Ab
aB
ab
Here, haplotype
frequencies differ from those expected if the SNPs are independent (they are in disequilibrium
)
f
AB
≠
f
A
f
B
Linkage disequilibrium
Slide36D
≈
0 when near linkage equilibrium
D
≠ 0
when there is linkage disequilibrium
Two
commonly-used measures:
Measuring LD
= the (squared) correlation
b
etween the two SNPs
Slide371
2
3
4
r
2
is less than one unless SNP A is a perfect surrogate of SNP B in the sample
D
’
statistic less than one if and only if all four haplotypes are present in sample
So
D
’
is
1 unless visible recombination has occurred
Haplotypes and LD
Slide381
2
3
4
r
2
is less than one unless SNP A is a perfect surrogate of SNP B in the sample
D
’
statistic less than one if and only if all four haplotypes are present in sample
So
D
’
is
1 unless visible recombination has occurred
Haplotypes and LD
r
2
=1, |
D’|
=1
r
2
<
1, |
D’|
=1
r
2
<
1, |
D’|
<1
Slide39Recombination and LD
Slide40Population genetic processes summary
Genetic drift decreases diversity and heterozygosity, and increases levels of LD. It acts faster in smaller populations.Mutations
occur at about 60 mutations per diploid genome per generation. But most are lost due to drift.Recombination breaks down correlations between alleles. It occurs in a highly nonuniform manner, clustered into recombination hotspots
.
Slide41Population size matters
We’ve seen that in larger populations we have to go further back in time to time to find the common ancestor Consequently there is more opportunity for
Mutation, increasing genetic diversityRecombination, decreasing correlation between alleles
Slide42The human genome is very large, and broken up into essentially independent chunks by recombination.This gives us many observations of the ancestral process, and considerable power to understand ancestry. Will give two examples.
The power of population genetic inference from a large genome
Slide43An example
Li and Durbin, “Inference of human population history from individual whole-genome sequences
”, Nature 2011 Years in the past
Idea: a single genome gives us many observations of the ancestral process. As for the bottleneck example, more coalescence => smaller population size.
Slide44Human population history
The recent migration of European from Africa has lead to small effective population sizes
Slide45Differences between populations
The overall pattern of LD is conserved
The different ancestral histories lead to different levels of LD
Slide46Population genetics
Genetic drift generates correlations between allelesRecombination breaks them down
The ancestral population size and history determines the amount of diversity and how it is structured Natural selection can generate strong differences between populations
Slide47Real populations are more complex admixture
http://
admixturemap.paintmychromosomes.com
Slide48Real populations are more complex
natural selection
When a beneficial mutation arises it spreads quickly through the population generating strong correlations between alleles
Slide49Natural Selection
Big differences in the patterns of diversity between populations can be generated by natural selection
Slide50Differences between populations
Big differences in the patterns of diversity between populations can be generated by natural selection
Slide51Yoruba from Ibadan, Nigeria
Utah residents, ancestrally Northern and Western European
24 haplotypes (12 individuals) 100 SNPs on chromosome 20
Slide52Differences in patterns of LD
An experiment:
Take genome-wide SNP data collected from a European population (A)
Take each SNP and find the SNPs which is most correlated with it (and remember how correlated it is)
Go to another European population (B) and compare the correlation between the two SNPs in the new population
(Measure correlation as r
2)
Slide53Differences in patterns of LD
Across Europe
Within Kenya
We will look at this in the practical
Slide54Thanks!
Slide55Recombination and physical distance
r
2
=1
r
2
=0.9
r
2
=0.5
r
2
=0.1
Correlations decay with distance (due to recombination)
Slide56Looking at patterns of LD
Low r
2
High r
2
LD patterns are complicated
Assume similar physical spacing
Slide57Recombination clusters along chromosomes
Studies have shown that recombination is not uniform along chromosomes
Slide58The power of population genetic inference from a large genome
Slide59Yoruba from Ibadan, Nigeria
Utah residents, ancestrally Northern and Western Europe
24 haplotypes (12 individuals) 100 SNPs on chromosome 20
Slide60LD and Recombination
There are lots of ways to measure LDRecombination is not uniform along chromosomesMuch of the recombination happens in hotspots and these demark breakdown in correlations
Correlations do persist across hot spots
Slide61Differences between populations
The overall pattern of LD is conserved
The different ancestral histories lead to different levels of LD
Slide62Population structure in Africa
There is evidence for widespread population structure across Africa
Slide63Population structure in Africa
Add population differences between groups from the same region
Slide64Luhya
in
Webuye
, Kenya
Maasai
in
Kinyawa
, Kenya
24 haplotypes (12 individuals) 100 SNPs on chromosome 20
Slide65Slide66LD terminology‘Causal’ variant – a variant that has a functional effect on a trait (such as disease).
Linkage disequilibrium – the pattern of correlations between alleles along a chromosomeTag SNP – a SNP that is in LD with a variant of interest (and that we may have typed directly)
Slide67Summary
Different ancestral histories have led to different patterns of diversity Natural selection can generate strong differences in haplotype patterns
Population structure across Africa, and between groups in Africa, will lead to differences in the structure of LD
Slide68Slide69Slide70Genetic drift
Allele frequencies change by chance over time
Slide71Genetic diversity
180 haplotypes (90 individuals) from
Luhya
in
Webuye
, Kenya typed at 6856 SNPs in 10 Mb region on chromosome 20