distribution of mutation effects on fitness using random matrix theory Guillaume Martin Institut des Sciences de lEvolution ISEM UMR 5554 Université Montpellier II CNRS IRD Martin G ID: 811404
Download The PPT/PDF document "A null model for phenotype-fitness lands..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
A null model for phenotype-fitness landscapes and the
distribution of mutation effects on fitness: using random matrix theory
Guillaume MartinInstitut des Sciences de l’Evolution ISEM UMR 5554Université Montpellier II - CNRS - IRD
Martin G
. (2014)
«
Fisher’s geometrical model emerges as a property of complex integrated phenotypic networks”,
Genetics
197
(1): p. 237-255
Slide2The DFE :
definition
DFE (Distribution of Fitness Effects)
: distribution of the change in fitness produced by
random mutations (sometimes alludes to beneficial ones only).
Deleterious mutations
Beneficial
mutations
% and
effect
of
each
type
Slide3The DFE: implicationsAdaptation
from de novo mutations: all models
of adaptation from new alleles:
Mutation-
selection balance = null model of standing variance: distribution of fitness at equilibrium:
Cost
of adaptation and evolution
in
heterogeneous
environments
:
DFE
across
environmental contextsEpistasis/ dominance and evolution of the
genetic system (sex, inbreeding
etc.): DFE across genetic backgrounds
Slide4The DFE: implicationsThe challenge:
Given:a set of genotypes
their initial frequency, their fitnessesthe demographic stochasticity (
)population genetics => fate of the population (adaptation, demography)
Dynamical system with fixed nb of components
Dynamical system with dynamical nb of components
MUTATION =>
need
to
predict
the
above
data for
these « yet to appear » types ~ DFE
Slide5DFE:empirical
measurementGenerating
mutants:Single random mutations in gene or genome (transposon inserts, site directed mutagenesis etc.)
Gene deletion sets (covers ~ all genes)mutation-accumulation
experimentsMeasuring mutant « fitness »:Survival, growth rate, competitive
index => one or two mutantsmolecular tags/deep sequencing => joint
growth of many mutants See e.g
.
Hietpas
et al. (2011)
PNAS
& (2013)
Evolution
.
Slide6VSV
TEV
Carrasco et al 2007 J. Virol. Sanjuan & Elena PNAS 2004
X174
Domingo-
Calap
2009 PLoS Genetics
E
. coli
Elena et al 1998
Genetica
yeast
Szafraniec
et al
2003
Genetics
DFE:
beneficial
deleterious
lethal
Special issue: Phil. Trans. R. Soc.
Lond
. B. Biol.
Sci.
(2010)
The
Good,
the Bad
and the
Ugly
Slide7DFE: patternsSurpisingly simple on
some aspectsMajority
= deleterious, average = deleteriousSkewed to the rigth
A portion of lethalsWhy
such apparent generalities ?But context - dependent
Epistasis is pervasiveVariance (and mean to a lesser
extent) depend on environment/context (like sexes)
Predict
this
context
-
dependence
?
Slide8Twomain
research programs
1.
Heuristic
models
: « top – down »Focus on DFE or quantitative traitsSet of simplifying assumptions
Aim
at
generality
Parameterized
from
the DFE
itself: « effective » parametersEx: Fisher’s (1930)
geometrical model FGM ~
Lande’s (1980) models2. Mechanistic models: « bottom
-up »
Focus on basic
underlying
traits (ex:
metabollic
fluxes in
cells
)
Optimality
assumptions
Specific
to
particular
known
species
x
environment
Set of
empirically
known
+
parameterized
relationships
Ex:
Flux - Balance
Analysis
FBA
Slide9Hereanisotropic version
Fitness
Trait 2
Trait 1
Phenotype
z
Optim
al
p
h
e
notype
Mutant
phenotypes
w
mut
Parent
al
p
h
e
notype
ASSUMPTIONS
Existence
of an optimum for a set of
traits
Quadratic
/
Gaussian
fitness
function
Centered
gaussian
effects
of mutation on
phenotype
w
o
1. Fisher‘s Geometrical Model (FGM)
s
Slide101.Fisher‘s
Geometrical Model (FGM)
Empirical
validations:
Gamma
shape
of DFEs in optimal conditionsEpistasis distribution among
random
pairs of mutations
Rate of adaptation
across
different
genetic backgrounds
Still few tests: virus, coli, yeast,
drosophila.Still issues about variation across genes, across environments
Perfeito
et al.
Evolution
(2013)
Martin et al.
Nature
Genetics
(2007)
Slide111.Fisher‘s
Geometrical Model (FGM)
General
criticism
/
Limits
: Too heuristic/simple to be
realistic
Single optimum
Only
tests on
predictions
not on
premisces
« Traits » not satisfyingly definedNo mechanism
Slide122. Flux- Balance Analysis
FBAmetabolic fluxes in a cell
=> growth ratemetabollic network => stoechiometric equations (all metabolites)Steady state (metabolic equilibrium
)Linear programming: constrain the solution to optimize some
criterion or set of criteria (growth rate, ATP yield etc.)DFE:
remove a gene = remove a reaction in the system=> New steady
state = mutant fitness
Slide132. Flux- Balance Analysis
FBA
Empirical
validations
(
reviewed in
Harthcombe et al. 2013, Plos Comp. Biol.)
metabollic
fluxes in model microbes,
red
blood
cell
growth rates of gene deletion setsepistasis between
gene deletetions
Fong & Palsson Nature
Genetics
(2004)
Harthcombe
et al
PLoS
Comp
.
Biol
(2013)
Slide142. Flux- Balance Analysis
FBA
General
criticism
/
Limits
(for the DFE purpose): Too specific
/
mechanistic
to
be
generalized
:
non model species, complex/unknown metabolic environments,
multicellulars Single optimumEffect of other mutations
than gene deletions ?Effects on other things than metabolism
Slide15Common points /differences
Some
g
enotype
–
phenotype – fitness mappingFGM: a priori FBA:
empirically
parameterized
(
partially
)
Optimization
:FGM : at some (small) distance
from optimumFBA : some trait is
optimized (ATP, growth rate etc.)Pleiotropy:FGM : assumed dimensionality
(
measurable
from
DFEs
)
FBA
:
each
deletion
changes
many
fluxes via the network
Can
we
link
the
two approaches ?
System’s Biology
Experimental
evolution
Statistical
treatment
Laws
of large numbers
Set of general qualitative premisces
macroscopic
outcome
simplified
phenotype
– fitness
mapping
DFE
Slide17Pleiotropy(many components
jointly affected by each mutation)2. Weak
unbiased mutation effects on basic functions(local analysis around the parent phenotype)
3. Phenotypic integration(Pyramidal integration from basic
functions to fitness)4. Optimization(existence of a locally
unique optimum)General
Assumptions
Slide18A "hairball" depiction of the
E. coli metabolic network extracted from KEGG and visualized using
Cytoscape. Reactions are in magenta and chemical compounds are in greenSource: http://www.kavrakilab.org/bioinformatics/metapathReactions
Chemical compounds
High pleiotropy &
small world networks
E.Coli, Yeast metabolic network
Many
components
Some
are hubs :
« Small world »
property
«
Scale
free » property
PLEIOTROPY
Slide19Source: Yan
K et al.
PNAS (2010)
Hierarchy
:
phenotypic
network are «
integrated
»
FBA
Optimization
function
s
growth
rate = f(
metabolites
fluxes)
much
fewer
metabolite
enter the
function
Than
total system
Hierarchy
and
integration
:
Many
«mutable» traits =>
fewer
«
optimized
» traits
Slide20Schuetz et al.
Science (2012)
Adaptation to glucose minimal meidum
in E. coliLenski & Travisano (1994).
0
2000
6000
10000
0.0
0.2
0.4
0.6
generations
Mean
Malthusian
relative fitness
Proximity
of a local optimum
Experimental
Evolution
:
fitness
curves
saturate
FBA
:
flux distribution are close to « optimal »
In
evolved
systems
UNIQUE LOCAL OPTIMUM
Slide21Model
Assumptions
General distributionUnbiased
:
Around
a local optimum
3
1
4
2
1. Pleiotropy
4
. Local optimum
3.
Phenotypic
Integration
2
.
Weak
unbiased
mutations
Slide22(4) Nearby fitness optimum => there is a basis for
where
Key local approximations
(2)
mutations have mild effects => linear approx. around
parent
Where
matrix
containing
all «
pathway
coefficients »
from
mutable to
optimized
traits
local analysis
&
Central Limit Theorem
:
Mutational
covariance
between
optimized
traits
mutational covariances between mutable traits
B:
pathway
coefficients between mutable and optimized traits
Large
number
approximation 1: CLT
Anisotropic
Fisher’s
model
Optimized
traits are
G
aussian
Gaussian
effects
of mutation
Quadratic
fitness
function
on traits
Slide24Isotropyand the spectral distribution of M
Eigenvalues of
:
Spectral distribution of
M: distribution of the eigenvalues
across traits
isofitness
isofitness
Isotropy:
anisotropy
:
Direction, not
just
distance, has an
efffect
=>
Less
predictable
from
only
fitness data
Slide25Distribution of the eigenvalues of
Many
coefficients, no idea of their valuesConsider them distributed : large random
matricesTheory of spectral distributions = Random Matrix TheoryMany known
results (a branch of probability theory, applied
to nuclear physics, statistics, neurology etc.)Overlooked in
ecology/evolution I think
Large
number
approximation 2:
random
matrix
theory
Slide26pleiotropy plus integration (
and )
The spectral distribution of
converges toThe Marçenko – Pastur distributionwhen
contains
iid elements with
zero
mean
with
A
form
of central
limit
theorem: Independent of the distribution of the elements in
(same general conditions as in central limit theorem)
Convergence is fast in
Large number approximation 2: random matrix theory
V. A.
Marçenko
and L. A.
Pastur
(1967).
Distribution of eigenvalues in certain sets of random
matrices.
Mat
. Sb.
(N.S.), 72 (114):
507–536.
Slide27Pb:
contains non iid elements
: iid
elements
with
zero mean : some fixed
covariance matrix
=> Approximation of the
limit
spectral distribution (LSD)
when
by a
Marçenko-Pastur
with equivalent
dimensionsWe have:
where
and
some
scale
Marçenko
Pastur
Approximation
and
When
with
: coefficient of variation of the
eigenvalues
of
Pastur Approximation
Sketch of derivation: use of the S transform of the LSD of
,
approximate it to leading order in
,
retrieve the S transform of an MP lawThe S
transform
uniquely
detemrines
the LSD
and
When
with
: coefficient of variation of the
eigenvalues
of
The
Marçenko
–
Pastur distribution
UniformNormalMixture of the
two
Distribution
Of
elements
in
B
Key ratio
shaping
the distribution of
:
: number of mutable / optimized traitsCovariance between pathway coefs & mutable traits
=> reduction to
Less
isotropic
Slide30Highdevelopmental
integration => isotropy
Developmental integration:
=> ISOTROPIC Fisher’s modelAll traits ~ equivalent for selection
& mutationFailry general conclusion as general conditions for :
Distribution of mutation effects on mutable traitsDevelopmental function relating
mutable to optimized traitsBut things can
be
slightly
more
complicated
…
Pb:
contains elements with non zero
mean
«
equivalent
»
iid
elements
with
zero
mean :
some fixed fixed matrix of small rank
(
here
)
Use a
result
from
Beynach
&
Nadakuditi
(2011).
Advances
in
Mathematics
227(
1
)
Phase transition of the
first
eigenvalues
out of the MP
law
as the coefficient of variation of the
elements
decreases
(
bias
)
Here
:
just
the maximum eigenvalueAnd we may use the
result in the MP approximation limit
Effect of non zero
mean
(bias)
Slide32iff
Phase transition
vector
of
means
of the coefficients in
matrix of
civariance
between
coefficients in
bias => anisotropy
So far, pathway coefficients = unbiased:
If there is bias in the distribution of coefficients (
sufficiently so)A predictable « Phase transition » to anisotropy:
= A single
leading
direction
Slide34DFECumulant
generating function of the DFE:
Link to spectral distribution of
:
:
shannon
transform
of the spectral distribution
of
fitness distance of the parent
genotype
to the optimal
genotype
Below phase transition
(
),
Marçenko-Pastur
limit
:
:
analytic
form
function
of
Approximately
isotropic
:
deviation
is
of
order
:
Beyond
phase transition
(
)
Coonvolution
of
Marçenko-Pastur
limit
for
smallest
and contribution
from
dominant
eigenvalue
Anisotropy
:
Analytic
DFE,
depends
on fitness distances in
two
subspaces
of the transition on
DFEs
Simulationvs.Analytic
The isotropic Fisher model is accurate
Developmental bias => anisotropic FGMFatter
tailsNo directionality effects
Directionality critical:Behaves ~ as if one
leading
dimension
In all cases: network model converges to a
much
simpler
model
with 3 to 5 measurable parameters (FGM)
Slide36EmpiricalDFEs
Engineered SNPs
in two ribosomal genes of salmonella + estimate sNB: non-syn
had same DFE as syn mutations (!!)
Lind et al. Science (2010)
No anisotropy detected:
Weak
anisotropy
detected
:
FGM
DFEs
Slide37ConclusionsTry
to justify the Fisher’s model « from first
principle »: ermerges from a much more complex network model, given a few qualitative properties
Perspective : analyze empirical network models
Variation of DFE among genes/modules:
varies but this has
little efffect, varies ?
scale
varies or
averages
up ?
Beyond
or
below
phase transition = essential vs. non-essential genes ? Cause of
parallel evolution ?Perspective:
study different gene specific DFEs
a
clearer
definition
of traits in the FGM,
which
pleiotropy
is
important for adaptation(
, not
)
Collaborators
on Fisher’s Geometric model:Luis M.
Chevin (CEFE Montpellier) (+ proofread the article)Thomas Lenormand (CEFE Montpellier)Sylvain Gandon (CEFE Montpellier)David Waxman (
Fudan University, Shangai)Ophélie Ronce (ISEM Montpellier)
For details see:Martin G. « Fisher’s geometrical model emerges as a property of complex integrated phenotypic networks”,
Genetics 197(1): p. 237-255
Slide39When
,
, Marçenko Pastur law :
Parameters
: ratio
and
scale
set by
the
mean
of
PDF:
with
Marçenko
-
Pastur
Law
(
)
C
onvergence to
isotropy
as
with
higher
phenotypic
integration
as long as
and
ratio
and
scale
Marçenko
-
Pastur
Approximation
under
anisotropy
:
coeffient
of variation of the eigenvalues of
Tools
:
approximate
some
generating
functions associated with
Equivalent pleiotropy:
Slide41« Phase transition » toanisotropy
So far key assumption:
Marçenko Pastur law If the
pathway coefficients have a strongly
biased distribution: phase transition
: ~ a coefficient of variation of the
If
: the maximal eigenvalue
rises above the bulk
Tools
: simple application of
Benaych
-Georges &
Nadakuditi
(2011) for
« Phase transition » to anisotropy
Slide43,
« Phase transition » to
anisotropy
in the
general
case
Anisotropy
: for a
given
total distance
,
directions
DFE’s
Empirical
parameterization
Isotropic case or
anisotropic case: we
can estimate (
in permissive conditions
a
nisotropic
fit
isotropic
fit
Strikingly
similar
and
in the two genes, one suggests a significant
Ex:
random
single
nucleotide
substitutions in
two
ribosomal
genes
in
salmonella
Data:
Lind
& Anderson (2010),
Science
(
pooled
syn
and non
syn
mutations)
Slide45Applications: phase transition and essentialgenes
Essential genes
: lethal when deleted. genes/modules that sample a set of strongly biased
pathway coefficients
: potential to create lethals
Non-essential
genes: sample unbiased pathway distributions (
)
May all have
roughly
the
same
and
because
Same
set of traits under optimized selection (genes differ in their
subspace)
Randomly
sampling
pathway
coefficients
among
all
pathways
(
averages
up)
Testable
:
L
ook for essential
genes
Produce
single mutants and
estimate
as above
predictions
in the isotropic case
The model may explain whyGamma
shape of the DFE in permissive conditionspermissive = local analysis about
: our most assumption free predictions
Isotropic model seems
to workGenomic DFE is an
average
of all modules,
If
in most of them the genomic >DFE is ~ isotropic
Parrallel
evolution
in
response to
some stresses (e.g. antibiotic resistance etc.)Those genes that have
have strong response in one direction
If stress
requires
this
direction:
they
respond
more
than
average
testable
:
estimate
in the genes involved in
parralel
evolution
isotropy
in the general case
method: approximate the cumulant generating functions of
by that of the isotropic distribution when
Cumulant
generating function (CGF) of the DFE:
,
Ignoring
:
:
Shannon transform of the spectral distribution of
Isotropy
Gamma
approx
Slide48The DFE is
fully determined by The eigenvalues
of
The
parental positions in the diagonal systemCumulant
generating function (CGF) of the DFE:
,
Stochastic
representation
of the DFE
The CGF
fully
characterizes
the distribution (as the
pdf
does
)
The
derivatives
of the CGF
at
provides
all the
cumuants
of the DFE
The
three
first cumulants are the
mean
variance and
skewness
empirical
measurement
1. Create a set of single random mutants: mutation accumulation in highly inbred conditions -> naturalSite-directed
mutagenesis -> SNPstransposon mutagenesis -> indelsSelect resistance
mutations and evaluate them in permissive conditions -> tricky (covariance between environments
)
Slide50Yeast
E. coli
Transcriptional
regulatory
networks
Many
components
Pyramidal
hierarchy