Trevor Bedford (@trvrb)
May 26, 2020
GS541 Introduction to Computational Molecular Biology
Taxa A and B are separated by 1 mutation, where divergence is low relative to alignment length, ie 1 mutation in a 30 kb genome. Evolutionary rate μ = 2 mutations per month.
This is a Poisson process in which k events are observed in an unknown amount of time t, but with known rate μ. $k \sim Poisson(\mu \, t)$
With $k = 1$ and $\mu = 2$, we have $\mathrm{Pr}(k|t) = 2 \, t \, e^{-2 \, t}$
Likelihoods multiplied across branches and μ estimated
Approaches:
$$\mathrm{Pr}(\mathrm{coal} \, | \, i=2) = \frac{1}{N}$$
Probability of first lineage picking an arbitrary parent is 1, while the probability of the 2nd lineage picking the same parent is $\frac{1}{N}$.
Probability of coalescence scales inversely with population size.
$$\mathrm{Pr}(\mathrm{coal}) = \binom{i}{2} \frac{1}{N} = \frac{i(i-1)}{2N}$$
There are $\binom{i}{2}$ ways pairs of lineages can pick the same parent.
Probability of coalescence scales quadratically with lineage count.
$$\mathrm{E}[T_i] = \frac{2N}{i(i-1)}$$
This is a geometric distribution. If each generation there is a $\frac{1}{x}$ probability of an event occurring, we expect to wait $x$ generations for the event to occur.
With per-generation probability of an event $\frac{1}{x}$ small, but many generations, then the
discrete time geometric distribution approximates to a continuous time
exponential distribution.
Thus, we assume $T_i$ to be exponentially distributed with mean
$$\mathrm{E}[T_i] = \frac{2N}{i(i-1)}.$$
These approaches directly estimate the pairwise rate of coalescence $\lambda$, which is measured in terms of events per year. Thus, the timescale of coalescence $\frac{1}{\lambda}$ is measured as the expected waiting time in years for two lineages to find a common ancestor.
The timescale of coalescence $\frac{1}{\lambda}$ is equal to $N_e\tau$, where $N_e$ is measured in generations and $\tau$ is measured in years per generation. $\tau$ acts to rescale time from generations to years.