What you need to know Ahmed Rebai Ahmedrebaicbsrnrttn DNA The code of life DNA is a molecule made of four bricks Living cells organisms have DNA within it DNA contains ID: 542115
Download Presentation The PPT/PDF document "BASIC of GENETICS" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
BASIC of GENETICSWhat you need to know
Ahmed
Rebai
Ahmed.rebai@cbs.rnrt.tnSlide2
DNA.. The code of lifeDNA is a molecule made of four bricks
Living
cells
/organisms have DNA within itDNA contains the ‘text’ of lifeSlide3
DNASlide4
From DNA to proteinSlide5
DNAParts of DNA are CODING (give proteins) this
is
only 3% in human genome but 95% of yeastParts of DNA are NON-CODING:Introns
Regulatory
region
of
genes
Other
(
junk
DNA!)Slide6
GENEGene: a section of DNA that codes for a protein and protein contributes to a trait
A chromosome is a ‘chunk’ of DNA and genes are parts of chromosomesSlide7
genes … allelesBecause we have a pair of each
chromosome,
we
have two copies of each geneThese two forms
can
be
identical
in
sequence
or
different
:
they
are called ALLELEAlleles can yield different phenotypesSlide8
ALLELEAllele: the different ‘options’ for a gene
Example: attached or unattached earlobes are the alleles for the gene for earlobe shapeSlide9
DOMINANT/RECESSIVEDominant: an allele that blocks or hides a recessive allele
Recessive
: an allele that is blocked by or hidden by a dominant alleleSlide10
GENOTYPEGenotype: A person’s set of alleles (gene options)
Genotypes can be noted by
Two letters denoting alleles: AA, AB, BB or for single variations for example AA, AG, GG
A digit 1, 2, 3 or 0,1,2 (choosing a reference allele)
2
1
0Slide11
HOMOZYGOUS/HETEROZYGOUSHomozygous: When a person’s two alleles for a gene are the
same
Heterozygous
: When a person’s two alleles for a gene are differentYou get one allele from your mom and one from your dad.
If you get the same alleles from your mom and dad, you are
homozygous
for that gene.
If your mom gave you a different allele than your dad, you are
heterozygous
for that geneSlide12
PHENOTYPEPhenotype: A person’s physical features because of their genotype
What you look like (your phenotype) is based on what your genotype is (your genes)Slide13
Segergation: lessons from peasMendel (1822-1884) in the
monastry
of St. Thomas in the
town of Brno (Brünn), in the Czech Republic. By a series of experiments in 1856-1863 on garden peas
discovred
the
laws
of
inheritance
Slide14
Sexual reproductionSlide15
Mendelian genetics: the lawsSlide16
SegergationSlide17Slide18
Segregation rules1. Genes come in pairs, which means that a cell or individual has
two copies (alleles) of
each
gene.2. For each pair of genes, the alleles may be identical (homozygous WW or homozygous ww), or they may be different (heterozygous Ww).3. Each reproductive cell (gamete) produced
by an individual contains only
one allele of
each gene (that is, either
W or w).
4. In the formation of gametes, any particular
gamete is equally likely to include either allele (hence, from a heterozygous
Ww
genotype,
half the gametes contain
W and the
other
half contain w).5. The union of male and female reproductive cells is a random process that reunites the
alleles in pairs.Slide19
Mendel’s first lawThe Principle of Segregation: In the formation of gametes, the paired hereditary determinants separate (segregate) in such a way that each gamete is equally likely to contain either member
of the pair.Slide20
RecombinationMendel studied co-segregation
of
two
genes by crossing: Wrinkled and Green x Round and YellowSlide21Slide22
Mendenl’s second lawThe Principle of Independent Assortment: Segregation of the members of any pair of alleles is independent of the segregation of other pairs in the formation of reproductive cells.
This is of course valid for unlinked genesSlide23
RecombinationWhen two genes
are
linked
(close on the same chromosome) they do not segregate independently; frequencies of genotypes in
progeny
depend
on the distance
between
genesSlide24
Multiple genes for a phenotype: polygenic traitsSlide25
Continious scale for a phenotypeSlide26
Let us exerciceWhat are the genotypes produced
by the
following
matings and their frequencies:AA x AAAA x AaAA x aaAa x Aa
Aa x
aa
aa
x
aa
What
are the
frequencies
of
two-gene
genotypes from this mating: AABb x AaBB
?Slide27
Population GeneticsBasic concepts and theoriesSlide28
Probability in population geneticsConsider the offsprings of the
mating
Aa x AaThe addition rule:Pr(an offspring have at
least one A
allele
)=Pr(A-)= Pr(AA or Aa)= Pr(AA)+Pr(Aa)=1/4+1/2=3/4
For
any
two
independent
events
A and B Pr(A or B)=Pr(A)+Pr(B)The multiplication rule:Pr(
two offsprings having at least one A allele
each
)= Pr(A- and A-)=Pr(A-)
xPr
(A-)= 3/4x3/4=9/16
Far
any
two
independent
events
A and B
Pr(A and B)=Pr(A)
x
Pr
(B)Slide29
ExerciceTwo indivdiuals with genotypes
Aa and Aa
married
and had three children; what is the probability
that
one of
their
children
has the
genotype
aa
?
Pr(
aa and (AA or Aa) and (AA or Aa))= Pr(aa)xPr(A-)xPr
(A-)=1/4x3/4x3/4=9/64ButSince the aa child have three
possible
birth
orders
we
should
multiply
by 3.
so
27/64.
Compute
for the case of
two
children
?
(
response
: 6/16; for 4
children
this
is
also
27/64)Slide30
Organization of genetic variationA population is a group of organisms
of the
same
species living within a sufficiently restricted geographical area that
any
mmeber
can
potentially
mate
with
any
other member (of the opposite sex)Population subdivision can be due to
geographic constraints as well as to social behaviourLocal populations: by country,
town
, : a group of
individuals
that
can
interbreed
also
said
subpopulations
or
Mendelian
populationsSlide31
Genetic variationPhenotypic diversity in
natural
populations
is impressive and is due to genetic variation: multiple alleles for many genes
affecting
the
phenotype
Population
genetics
is
concerned
by
describing how alleles are organized into
genotypes and to determine wether alleles of the
same
or
different
genes
are
associated
at
randomSlide32
32Allele frequencies
in populations
Allele
frequency
is
the proportion in the population of all
alleles
of the
gene
that
are of the
specified
type
Since
the population are of large size allele frequencies are estimated from a population
sampleConsider a gene with
genotypes
:
AA
,
Aa
et
aa
and a
sample
of N
individuals
We
count the
number
of
individuals
that
have
AA
,
Aa
et
aa
genotypes
(
denoted
N
AA
,
N
Aa
et
N
aa
,
respectively
) and
we
estimate
the
ferquency
of
allele
A
by the
number
of
alleles
A
among
all
alleles
segregating
in the population,
that
is
:
p
A
= (2N
AA
+
N
Aa
)/2N
and
then
p
a
=1-
p
ASlide33
ExampleIn a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the
ferquency
of allele M is
p
M
=(2*298+489)/(2*1000)=0.54
We can compute a 95% confidence interval for the frequency based on the binomial law and normal approximation:
This approximation is only valid for non-small (>0.1) and non-high (<0.9) frequencies
In example we get
[0.52 ; 0.56]Slide34
For rare allelesFor rare alleles (less than
1%)
there
is chance that a sample do not contain any allele carrier
so
the
frequency
estimation
will
be
0
An alternative
is
to use
Emprical Bayes estimation:For uniform prior this gives p=(k+2)/(n+4)
where k is the observed number
of
alleles
in the
sample
and
n
the total
number
of
allelesSlide35
Random matingMeans that any
two
individuals (of opposite sex) have the same probability to mate This means
that
genotypes
meet
each
other
with the same probability as if they were
formed by random collision of genotypesRandom
mating
can
apply
to
some
genes
like
those
controlling
blood
groups or
neutral
polymorphisms
but not for
others
like
those
controlling
skin
color
or
heightSlide36Slide37
Non overlapping generationFormally this
means
that the cycle of birth, maturation and death includes the death of all individuals
present
in
each
generation
before
the
next
generation matureThis is only an approximation (simplistic in
humans) but works well as far as geotype
frequencies
are
considered
Slide38
The Hardy-Weinberg principle38
If
we
assume that The organism is
diploid
Reproduction
is
sexual
Generations
non-
overlapping
Allele
frequencies identical in males and females
The population is of large sizeMating is
random
Migration and mutation
is
negligible
Natural
seltcion
does
not affect
allelesSlide39
Then..Genotype frequencies
can
be deduced from allele frequencies (p
is
frequency
of
allele
A
,
q=1-p
of
allele
a):
AA: p² Aa: 2pq
aa
: q²
These
frequencies
(
allelic
and
genotypic
)
remains
the
same
over
generations
:
we
say
that
the population
is
in
Hardy-Weinberg
Equilibrium
(HWE)Slide40
Why?Slide41
Implication of hweDespite very restrictive and incorrect assumption HWE
offers
a
reference model in which there are no evolutionary forces at work other
than
those
imposed
by the
process
of reproduction
itself
(
like a mechanical model of falling object without any force in action
other than gravity)The HW model separates life cycle to
two
phases:
games
->zygote and zygote->
adult
Even
if the
assumptions
of non-
overlapping
generations
is
not
true
HWE
will
be
attained
gradually
Applies
also
to
multiallelic
genesSlide42
Implication of hweSlide43
application of HWEWe can calculate the number
of carriers of a rare mutation in the population
Ex:
cystic fibrosis in european population patient is known to be 1 over 1700 (
q=0.024
)
so
the
number
of
heterozygotes
is
(due to HWE) about 5%
So
when there is a very rare allele most of genotypes
containing this allele are heterozygous:
Show
that
for a rare
allele
of
frequency
is
1/1000
there
are 2000 times more
heterzoygotes
than
recessive
homozygotes?Slide44
HWE deviation44
Deviation
from HWE can
be
due to
inbreeding
,
population stratification,
selection
,
gender-dependent
allele
frequencies, non-random (assortative
) matingPrinciple do not apply directly
to X-
linked
genes
or Y-
linked
genesSlide45
Tests of HWE45
Compare
observed
to expected genotype
counts
using
Pearson chi-square test
of
goodness
of fit:
with
3
genotypes and 1 parameter estimated
(p) we have a test with 1
df
Inappropriate
for rare
variants
(
low
genotype
counts
): use
Fisher Exact Test
(FET)
Other
Exact tests are
available
in the
R
language
(
e.g
.
Genetics
package,…)Slide46
Pearson chi-square through D46
Let
D
A= PAA
- p²
Testing
HWE
is
testing
D
A
=0
Compute
p-value
= Pr(²
1df
> ²
obs
)
If
p-value<0,05 (or 0,0001)
then
Deviation
from
HWESlide47
47Example:
In a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the
ferquency
of allele M is Genotypes: MM MN NN
Observed counts : 298 489 213
Expected counts : 294.3 496.4 209.3
p
M
=0.54, P
MM
=0.294 so
D=0.298-0.294=0.004
²=N
D²/(p(1-p))²=1000*(0.004/(0.54*0.46))²
²=0.25<3.84; p-value=0.61
Tests of HWE:
let’s
do
it!Slide48
Haplotypes from genotypes
48
If
we study many
genes
they
can
be
linked
and one
can
use haplotypesA haplotype (haploid
genotype) is a set for
alleles
carried
by one chromosome for
several
genes
Consider
two
genes
(
A,a
) and (
B,b
)
with
allele
frequencies
(
p
A
,
p
a
) and (
p
B
,
p
b
)
If
gametic
frequencies
are
product
of
allele
frequencies
:
AB
:
p
A
x
p
B
,
Ab
:
p
A
xp
b
,
aB
:
p
a
x
p
B
,
ab
:
p
a
x
p
b
We
say
that
the
genes
are in
random
association or in
Linkage
equilibrium
Slide49Slide50
Linkage disequiulibriumIf the observed frequency of
gametes
(
e.g. PAB) differ from that expected
under
linkage
equilibrium
(
p
A
x
p
B
)
we say
that the gene is in
Linkage
Disequilibrium
(LD)
To
measure
and test LD
we
need
to know the
haplotype
frequenciesSlide51
Linkage Disequilibrium51
SNP1
SNP2
Allele Frequencies
40%
60%
30%
70%
No LD
Linkage Disequilibrium (LD)
12%
28%
18%
42%
a
A
60%
30%
10%
B
bSlide52
LD measures: DThe difference between
observed
and
expected haplotype frequencyIs also equal to
D
is
bounded
between
D
max
and
D
minSlide53
D’: standardized DPractically choose
alleles
A and B
such that D>0 and pA
>
p
B
,
A
standardized
measure
of LD
is
thus:
D’=1 denotes
complete
LDSlide54
The r² measure : more practical54
This is correlation from the 2x2 contingency table of haplotype counts
Or Slide55
Testing LDWe can show that
Nr²
is a chi-square test of LD (1df)Exercice: two blood group systems: M/N and S/s gave
following
haplotypes
(1000
individuals
):
MS: 474 Ms: 611 NS: 142 Ns: 733
Allele
frequencies
are M: 0.54, S: 0.31
Compute D and D’ and r²Test LDSolution: D=0.07, D’=0.50 r²=0.47, X²=470, p<10-100Slide56
Causes of LDLD is ‘created by linkage’If r
is
the recombination rate between two genes then we
can
show
that
LD
at
generation
t
is
given byDt=(1-r)tD0
If r is
small
(
genes
very
close on chromosome) the
decay
is
very
slow and
can
stay
for over
hundreds
of
generationSlide57
Recombination and LD
(1-r)/2 /2Slide58
Decay of LD over generationsSlide59Slide60
Admixture of populationsLD can be created
by the
merge
of populations having different gametic frequenciesLet two populations and two
genes
in linkage
equiulibrium
in
both
,
where
alleles
A and B have
frequencies 0.05 in the first population and 0.95 in the second populationA new population is formed by
equal mixture of the two populations, show that LD is high
in
that
population (D=0.2 and D’=0.81) ?Slide61
AdmixtureSlide62
Natural (Darwinian) selectionIndividuals differ in
their
ability to survive and reproduce owing in part to their genotypeTh
selective
advantage
/
disadvantage
is
measured
by
fitness
Selection results in a change of allele frequencies over generations and
deviation from HWESlide63
Effect of selection Slide64
Random Genetic DriftFor each
generation
there is a chance in the drawing of gametes that will unit to
form
the
next
generation
This chance
can
result
in a
random
change in allele frequency and may ultimately
lead to the fixation or elimination of
some
allelesSlide65
Simply sayingSlide66
Mathematical models of DriftWright-Fisher model (1930): probability of obtaining k copies of an allele that had frequency
p
in the last generation is:
expected time before a neutral allele becomes fixed through genetic drift is given by:Slide67Slide68
Population bottleneckSlide69
Founder effectSlide70
Population substructureWhen a population is organized in
several
subpopulations having different genetic composition (allele frequencies
)
Substructure
generally
results
in the
reduction
of
heterozygotes
frequency
relative to that expected with random mating
(Wahlund principle)Several measures
to
assess
population substructure : F-
statisticsSlide71
F-statisticsDefined by Wright (1921) (1-F
IT
)=(1-F
IS)(1-FST)Slide72
Another formulationThe mots useful to test substructure is FST an index
that
measures the level of genetic divergence among subpopulationsFST=(HT-H
S
)/H
T
H
S
:
average
heterozygosity
among
individuals within subpopulationsHT: average heterozygosity among individuals
within the total populationsAccording to variance of allele frequencies
Can
be
calculated
by R package (
hierfstat
)
F
ST
is
not a
genetic
distanceSlide73
How to use it?FST=1 means total divergence by fixation of alternative
alleles
in
subpopulations<0.05: little differentiation0.0<FST<0.15 moderate0.15<F
S
T<0.25
high
>0.25
very
high
Test chi-square
with
1
df
: X²= (k-1) N FST Examples: between european and
sub-sahrian african: 0.15Japanese-african
: 0.19
europeans
: 0.11Slide74
ExampleTwo population where allele frequency
is
0,5 and 0,3Slide75
AdmixtureGenetic admixture occurs when individuals from two or more previously separated populations begin interbreeding. Admixture results in the introduction of new genetic lineages into a population.Most human populations are a product of mixture of genetically distinct groups that
intermixed within
the last 4,000 years. Slide76
Admixture detectionBy testing HWEStandard statistical
methods
applied to data on genotype, alleles/haplotype frequencies: Principal component
Analysis
(PCA),
Clustering
: K-
means
,
hierarchical
,..
Advanced
methods
:
Maximum likelihood (psmix R package)Bayesian methods Wavelet
analysis (adwave R package)STRUCTURE Slide77
Principal component analysisSlide78
ClusteringSlide79
structure
inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed
.
http://
pritchardlab.stanford.edu/structure.html
Slide80
Admixture
https://www.genetics.ucla.edu/software/admixture
/Slide81
R packagesGenetics: Classes and methods for handling
genetic
data. Includes classes to represent genotypes and haplotypes at single markers up
to multiple markers on multiple chromosomes.
Function
include
allele
frequencies
,
flagging
homo/heterozygotes, flagging carriers of certain alleles, estimating and testing
for Hardy-Weinberg disequilibrium, estimating and testing for linkage
disequilibrium
,
...
Adegenet
:
Classes and functions for genetic data analysis within the multivariate
framework
Hierfstat
:
estimation of hierarchical F-statistics from
haploid or
diploid genetic data with any numbers of levels in the hierarchy, following the
algorithm Functions
are also given to test via
randomisation
the
significance of each F and variance
componentsSlide82
Recommended readings