/
Probabilistic Modeling of Probabilistic Modeling of

Probabilistic Modeling of - PowerPoint Presentation

Wolfpack
Wolfpack . @Wolfpack
Follow
342 views
Uploaded On 2022-07-28

Probabilistic Modeling of - PPT Presentation

Molecular Evolution Using Excel AgentSheets and R Jeff Krause Shodor Biological Sequence Space is Discrete Probability theory is crucial to understanding sequences Furthermore metrics and algorithms for analyzing sequences are statistical ID: 931297

sequence probability sample event probability sequence event sample outcomes probabilities roll outcome rule events molecular evolution heads larger small

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Probabilistic Modeling of" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Probabilistic Modeling of Molecular Evolution Using Excel, AgentSheets, and R

Jeff Krause (Shodor

)

Slide2

Biological Sequence Space is DiscreteProbability theory is crucial to understanding sequencesFurthermore, metrics and algorithms for analyzing sequences are statistical

Slide3

Sequence Comparison:

Similarity/Distance

C

A

G

TT

A

GCTCCATTAAGCTC

Proportional distance (p) = # of differences in aligned positions

p

= 2 differences in 10 positions

p

= 0.2

Slide4

Character Evolution in Biological SequencesC

A

G

TT

A

G

C

TC||+|+||+||CATTCAGATC||||+||+||CATTAA

GCT

C

Slide5

Probabilistic Modeling of Sequence EvolutionDuring replication (and recombination), errors occurSubstitution, insertion, deletion, inversion, translocation, …Different types of errors have different chances of occurring, but they can all be considered to happen randomly

To develop models we need data to indicate how often each type of error occurs

Use observed frequencies and our understanding of the mechanism of error occurrence to estimate probabilities of errors

Slide6

Stochastic Modeling of Sequences … and MoreYesterday we talked about dynamic modeling with difference equations and rates of changeInstead of thinking in terms of rate or proportion per unit time, we could think in terms of probability of occurring within a time-stepSo all of our dynamic models yesterday could be modeled probabilistically

Slide7

Probability Terms – Part ITrial – A single occurrence of a random process (e.g. a single coin toss, or roll of a die)Outcome – The result of a trial (e.g. “heads” or “tails”)Probability – The chance of a random outcome occurring (e.g.

p(heads

)=0.5, p(3)=1/6)

Frequency – The number of occurrences of an outcome

Relative frequency – Number of occurrences of an outcome divided by the total number of trials

Slide8

Probability Terms – Part IIEvent – A grouping of multiple outcomes (e.g. “roll an odd number”, “roll less than 5”)Independent – The probability of an outcome is not influenced by the outcomes of previous trials

Multiplication rule

The probability that two independent outcomes will occur is the product of their individual probabilities (

e.g.

“toss two heads”, “roll is odd and <= 4)

Slide9

Coin TossCreate an Excel worksheet that conducts 10, 100, or 1000 trials of a coin tossTally the frequency of heads for each number of trials Is it a fair coin? How do you know. Variation in composition vs

sample size

Degree of variation with sample size (range or difference between (max – min) 3 of heads

Probably smaller with small sample sizes since sample size limits range

Larger sample size makes variation across larger range possible

Proportional variation

vs sample sizeMuch larger for small samples since small differences in observed frequency have larger effect when divided by small denominator of small sample

Probability of a given sequence of outcomes – multiplication rule multiple independent eventsPermutations – 2n different n-length sequencesProbability of event specifying sequence composition – enumerate permutations and sum probabilities of events matching criteria, this is the addition rule

Slide10

Die RollIs it a fair die? How do you know?Event probabilities for single trialp(odd), p(odd and < 5)Event probabilities for two trial eventsp(2,5), p(2,5 in any order),

p(sum

is 7) – (use plop-it)

Union and intersection – explore addition rule and mutual exclusive, as well as multiplication rule and independence

Probability of a given sequence of length

n

= 6n

Slide11

Probability Terms – Part IIIUnion – The event that either or both of two events will occur on a trial (e.g. the union of “odd” and “>4” is “1,3,5,6”)Mutually

exclusive – Two events are mutually exclusive if they can’t occur simultaneously (e.g. “roll an even number” and “roll an odd number”)

Addition rule – The probability of an event consisting of mutually exclusive outcomes is the sum of the probabilities of the outcomes (e.g.

p(heads,tails

) =

p(heads

) + p(tails)

Complement – The complement to any event includes all possible outcomes not in the event (e.g. “not heads”, “not 5”). The probability of the complement is ( 1 – the probability of the event)Exhaustive – The set of all possible outcome, the probability must sum to 1

Slide12

Molecular Evolution and PhylogeneticsBiology basicsCentral dogma: DNA -> RNA -> proteinDNA replication and processing can lead to changes in DNA composition

Metrics of distance

Observed substitution frequencies

“How often do we see A replaced with C”

Distance based on evolutionary model

“How many events separate these two sequences”

Markov Models of Sequence EvolutionMarkov process – future state only depends on current state, not how it got there

Molecular genetic mechanisms at multiple scales with distinct probabilitiesSingle site events – sequencesEvents at larger scales

Slide13

Nucleotide substitution:Jukes-Cantor modelC C A T G

A

C

G

T

A

A

C

G

T

C

G

T

Substitution rates are equal

a

a

a

-3a

Markov process

a

a

a

a

a

a

a

a

a

-3a

-3a

-3a

Nucleotides are in equal abundance

Rate matrix =

M = {

m

ij

}

Slide14

Simulating Jukes-Cantor sequence evolutionOne nucleotide per sequence positionSimulating change as finite difference using rate equation would give fractional abundances at each position (population)

Need to convert matrix of rates to transition probabilities

P(t

) = {

p

ij

(t

)} = eMt

Slide15

Simulating Jukes-Cantor sequence evolutionP(t) =

p

0

(t) p

1

(t) p

1

(t) p1(t)p1(t) p0(t) p1(t) p1(t)p1(t) p1(t) p0(t) p1

(t)p1(t) p

1

(t) p

1

(t) p

0

(t)

{

p

0

(t) = (1 + 3e

-4

a

t

) / 4

p

1

(t) = (1 - e

-4

a

t

) / 4

with

Since each row sums to one, only one expression is needed

Slide16

Jukes-Cantor modelsAgentSheetsCell lineage treeExcelCell lineage treetwo sequence distanceProbability vs. timeR

Cell lineage tree vs.

phylogenetic

reconstruction

Slide17

ReferencesFelsenstein, J. (2003). Inferring Phylogenies (2nd ed.). Sinauer Associates.Nielsen, R. (2005).

Statistical Methods in Molecular Evolution (1st ed.). Springer.

Yang, Z. (2006).

Computational molecular evolution (

p

. 357). Oxford University Press.