Actual v. predicted distances of Atlantic cyclones
Machine learning
Statistical models
Nonlinear forecasting
"Mechanistic" modeling
Inference of gene regulatory networks
from knockout, observational, and synthetic data
Compete Lasso, random forests, Bayesian networks, mutual information, ANOVA, etc.
An artificial neural network that included microbial interactions performed best.
Reconstruct attractor ("library") from time series
Use attractor to make short-term predictions
Trajectories in chaotic attractors diverge
$$ \lvert \delta \textbf{Z}(t)\rvert \approx e^{\lambda t}\lvert\delta\textbf{Z}_0\rvert$$
$\lambda$ is the Lyapunov exponent
(so with chaos, $\lambda>0$)
An unsolved problem
Use $E$ (and $\tau$) that yield best predictions
where $I(s)$ is the incidence in week $s$, $h$ is a strain-specific incidence threshold, $T$ is the time of crossing $h$, and $Y$ is the strain's whole-season cumulative incidence proxy.
Aim: Predict peak timing and magnitude
Achieves 37% accuracy with 1-3 week lead, ~50% at 0 week lead
General strategy for antigenically evolving seasonal influenza viruses is attempt to match vaccine strain as close as possible to circulating viruses
Due to manufacturing and distribution, vaccine strain selection occurs in Feb for an fall vaccination campaign ahead of seasonal influenza epidemic
Future frequency $x_i(t+\Delta t)$ of strain $i$ derives from strain fitness $f_i$ and present day frequency $x_i(t)$, such that
$$x_i(t+\Delta t) = \frac{1}{Z(t)} \, x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$
Strain frequencies at each timepoint are normalized by total frequency $Z(t)$. This captures clonal interference between competing lineages.
The fitness $f$ of strain $i$ is estimated as
$$f_i = \beta^\mathrm{A} \, f_i^\mathrm{A} + \beta^\mathrm{B} \, f_i^\mathrm{B} + \ldots$$
where $f^A$, $f^B$, etc... are different standardized viral attributes and $\beta^A$, $\beta^B$, etc... coefficients are trained based on historical evolution
Antigenic drift | Intrinsic fitness | Recent growth |
---|---|---|
epitope mutations | non-epitope mutations | local branching index |
HI titers | DMS data (via Bloom lab) | delta frequency |
Strain fitness $f_i$ is largely fixed by the "fundamentals" of the strain rather than being learned from frequency behavior.
$x' = \frac{x \, (1+s)}{x \, (1+s) + (1-x)}$ for frequency $x$ in one generation with selective advantage $s$
$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1-x_0)}$ for initial frequency $x_0$ over $t$ generations
Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1 - x})$
Multinomial logistic regression models the probability of a virus sampled at time $t$ belonging to variant $i$ as
$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$
where the model has $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint and $f_i$ the growth rate or fitness of variant $i$ for $n$ variants.
The model is fit to minimize "log loss" of predicted variant vs observed variant across observations in dataset.
Multinomial logistic regression should work well for SARS-CoV-2 prediction, except new variants have been emerging fast enough that the prediction horizon is really quite short