Forecasting

Forecasting is a big problem

Rapid progress

Actual v. predicted distances of Atlantic cyclones

Gandon et al. 2016

General, overlapping approaches

Machine learning

Statistical models

Nonlinear forecasting

"Mechanistic" modeling

DREAM challenges

Inference of gene regulatory networks

from knockout, observational, and synthetic data

Kuffner et al. 2012

Compete Lasso, random forests, Bayesian networks, mutual information, ANOVA, etc.

Ecological niche modeling

Larsen et al. 2012

An artificial neural network that included microbial interactions performed best.

Nonlinear forecasting

Reconstruct attractor ("library") from time series

Use attractor to make short-term predictions

Sugihara and May 1990

Forecasting communities

Deyle et al. 2015

Dynamics may be chaotic

Predictions with chaos: short shelf life

Trajectories in chaotic attractors diverge

$$ \lvert \delta \textbf{Z}(t)\rvert \approx e^{\lambda t}\lvert\delta\textbf{Z}_0\rvert$$

$\lambda$ is the Lyapunov exponent

(so with chaos, $\lambda>0$)

How to forecast

Choose an embedding dimension $E$ and lag $\tau$
Each point in $E$-dimensional space: $\{x_t,x_{t-\tau},x_{t-2\tau},...,x_{t-(E-1)\tau}\}$
Construct these points from the time series
Define a point to predict ("predictee")
See where predictee's $E+1$ nearest neighbors wind up $t$ steps into the future
Measure correlations $\rho$ between predictee's observed future state and neighbors' weighted predictions

Choosing $E$ and $\tau$

An unsolved problem

Use $E$ (and $\tau$) that yield best predictions

Sugihara and May 1990

Predict based on cumulative incidence

where $I(s)$ is the incidence in week $s$, $h$ is a strain-specific incidence threshold, $T$ is the time of crossing $h$, and $Y$ is the strain's whole-season cumulative incidence proxy.

Goldstein et al. 2011

Predictions v. observations (H3N2)

Goldstein et al. 2011

Predicting flu in Hong King

Aim: Predict peak timing and magnitude

Yang et al. 2015

Mechanistic model and particle filter

Yang et al. 2015

Achieves 37% accuracy with 1-3 week lead, ~50% at 0 week lead

Epidemic Prediction Initiative

Forecasting SARS-CoV-2

CNN

Unusual error

A statistical model

Some precedent: Farr's Law

(U.S.) COVID-19 Forecast Hub
see Cramer et al. 2022

Comparing models

Cramer et al. 2022

Evaluating performance

Cramer et al. 2022

Failure to predict change

Reich et al. 2021

Evolutionary forecasting

Influenza vaccine strain selection strategy

General strategy for antigenically evolving seasonal influenza viruses is attempt to match vaccine strain as close as possible to circulating viruses

nextstrain.org

Vaccine strain selection timeline

Due to manufacturing and distribution, vaccine strain selection occurs in Feb for an fall vaccination campaign ahead of seasonal influenza epidemic

Fitness models project strain frequencies

Future frequency $x_i(t+\Delta t)$ of strain $i$ derives from strain fitness $f_i$ and present day frequency $x_i(t)$, such that

$$x_i(t+\Delta t) = \frac{1}{Z(t)} \, x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

Strain frequencies at each timepoint are normalized by total frequency $Z(t)$. This captures clonal interference between competing lineages.

Łuksza and Lässig. 2014

Match strain forecast to retrospective circulation

Huddleston et al. 2020

Strain fitness estimated from viral attributes

The fitness $f$ of strain $i$ is estimated as

$$f_i = \beta^\mathrm{A} \, f_i^\mathrm{A} + \beta^\mathrm{B} \, f_i^\mathrm{B} + \ldots$$

where $f^A$, $f^B$, etc... are different standardized viral attributes and $\beta^A$, $\beta^B$, etc... coefficients are trained based on historical evolution

Antigenic drift	Intrinsic fitness	Recent growth
epitope mutations	non-epitope mutations	local branching index
HI titers	DMS data (via Bloom lab)	delta frequency

Model successfully predicts clade growth and best pick from model is generally close to best possible retrospective pick

Huddleston et al. 2020. eLife.

Main issue

Strain fitness $f_i$ is largely fixed by the "fundamentals" of the strain rather than being learned from frequency behavior.

Genetic relationships of globally sampled SARS-CoV-2 to present

nextstrain.org

Rapid displacement of existing diversity by emerging variants

nextstrain.org

Population genetic expectation of variant frequency under selection

$x' = \frac{x \, (1+s)}{x \, (1+s) + (1-x)}$ for frequency $x$ in one generation with selective advantage $s$

$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1-x_0)}$ for initial frequency $x_0$ over $t$ generations

Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1 - x})$

Variants show consistent frequency dynamics in logit space

Multinomial logistic regression

Multinomial logistic regression models the probability of a virus sampled at time $t$ belonging to variant $i$ as

$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$

where the model has $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint and $f_i$ the growth rate or fitness of variant $i$ for $n$ variants.

The model is fit to minimize "log loss" of predicted variant vs observed variant across observations in dataset.

Multinomial logistic regression fits variant frequencies well

Original VOC viruses had substantially increased transmissibility

Model from Figgins and Bedford. 2022. medRxiv.

Clade and lineage forecasts continuously updated

Figgins, Lee, Hadfield. nextstrain.org

Multinomial logistic regression should work well for SARS-CoV-2 prediction, except new variants have been emerging fast enough that the prediction horizon is really quite short

Forecasting

Forecasting is a big problem

Rapid progress

General, overlapping approaches

DREAM challenges

Ecological niche modeling

Nonlinear forecasting

Forecasting communities

Dynamics may be chaotic

Predictions with chaos: short shelf life

How to forecast

Choosing $E$ and $\tau$

Observational noise v. chaos

Predicting flu

Epidemic sizes negatively correlated

Predict based on cumulative incidence

Predictions v. observations (H3N2)

Predicting flu in Hong King

Mechanistic model and particle filter

Epidemic Prediction Initiative

Forecasting SARS-CoV-2

Unusual error

A statistical model

Some precedent: Farr's Law

(U.S.) COVID-19 Forecast Hub see Cramer et al. 2022

Comparing models

Evaluating performance

Failure to predict change

Evolutionary forecasting

Influenza vaccine strain selection strategy

Vaccine strain selection timeline

Fitness models project strain frequencies

Match strain forecast to retrospective circulation

Strain fitness estimated from viral attributes

Model successfully predicts clade growth and best pick from model is generally close to best possible retrospective pick

Main issue

Genetic relationships of globally sampled SARS-CoV-2 to present

Rapid displacement of existing diversity by emerging variants

Population genetic expectation of variant frequency under selection

Variants show consistent frequency dynamics in logit space

Multinomial logistic regression

Multinomial logistic regression fits variant frequencies well

Original VOC viruses had substantially increased transmissibility

Clade and lineage forecasts continuously updated

What other models would you test?

What limits prediction with other pathogens?

(U.S.) COVID-19 Forecast Hub
see Cramer et al. 2022