### Forecasting is a big problem

### Rapid progress

Actual v. predicted distances of Atlantic cyclones

### General, overlapping approaches

Machine learning

Statistical models

Nonlinear forecasting

"Mechanistic" modeling

### DREAM challenges

Inference of gene regulatory networks

from knockout, observational, and synthetic data

Compete Lasso, random forests, Bayesian networks, mutual information, ANOVA, etc.

### Ecological niche modeling

An artificial neural network that included microbial interactions performed best.

### Nonlinear forecasting

Reconstruct attractor ("library") from time series

Use attractor to make short-term predictions

### Forecasting communities

### Dynamics may be chaotic

### Predictions with chaos: short shelf life

Trajectories in chaotic attractors diverge

$$ \lvert \delta \textbf{Z}(t)\rvert \approx e^{\lambda t}\lvert\delta\textbf{Z}_0\rvert$$

$\lambda$ is the Lyapunov exponent

(so with chaos, $\lambda>0$)

### How to forecast

- Choose an embedding dimension $E$ and lag $\tau$
- Each point in $E$-dimensional space: $\{x_t,x_{t-\tau},x_{t-2\tau},...,x_{t-(E-1)\tau}\}$
- Construct these points from the time series
- Define a point to predict ("predictee")
- See where predictee's $E+1$ nearest neighbors wind up $t$ steps into the future
- Measure correlations $\rho$ between predictee's observed future state and neighbors' weighted predictions

### Choosing $E$ and $\tau$

An unsolved problem

Use $E$ (and $\tau$) that yield best predictions

### Observational noise v. chaos

### Predicting flu

### Epidemic sizes negatively correlated

### Predict based on cumulative incidence

where $I(s)$ is the incidence in week $s$, $h$ is a strain-specific incidence threshold, $T$ is the time of crossing $h$, and $Y$ is the strain's whole-season cumulative incidence proxy.

### Predictions v. observations (H3N2)

### Predicting flu in Hong King

Aim: Predict peak timing and magnitude

### Mechanistic model and particle filter

Achieves 37% accuracy with 1-3 week lead, ~50% at 0 week lead

### Forecasting SARS-CoV-2

### Unusual error

### A statistical model

### Some precedent: Farr's Law

### Comparing models

### Evaluating performance

### Forecasting flu evolution

### Vaccine strain selection timeline

### Seek to explain change in clade frequencies over 1 year

### Fitness models can project clade frequencies

Clade frequencies $X$ derive from the fitnesses $f$ and frequencies $x$ of constituent viruses, such that

$$\hat{X}_v(t+\Delta t) = \sum_{i:v} x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

This captures clonal interference between competing lineages

### Clonal interference

### Predictive fitness models

A simple predictive model estimates the fitness $f$ of virus $i$ as

$$\hat{f}_i = \beta^\mathrm{ep} \, f_i^\mathrm{ep} + \beta^\mathrm{ne} \, f_i^\mathrm{ne}$$

where $f_i^\mathrm{ep}$ measures cross-immunity via substitutions at epitope sites and $f_i^\mathrm{ne}$ measures mutational load via substitutions at non-epitope sites

### Model selection

### Performance

### Emergence and spread of SARS-CoV-2 variant viruses

### Emergence and spread of SARS-CoV-2 variant viruses

### Logistic growth

Exponential growth of fitter variant appears as logistic growth in frequency space

Logistic regression: $$p(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}$$

### Multinomial logistic regression fit

### What other models would you test?

### What limits prediction with other pathogens?