## Evolutionary forecasting for SARS-CoV-2

#### Trevor Bedford

Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
11 Jul 2023
Monthly Meeting
Northwest PGCoE

Slides at: bedford.io/talks

# Variant frequency dynamics

### Population genetic expectation of variant frequency under selection

$x' = \frac{x \, (1+s)}{x \, (1+s) + (1-x)}$ for frequency $x$ over one generation with selective advantage $s$

$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1-x_0)}$ for initial frequency $x_0$ over $t$ generations

Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1 - x})$

### Multinomial logistic regression

Multinomial logistic regression across $n$ variants models the probability of a virus sampled at time $t$ belonging to variant $i$ as

$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$

with $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint and $f_i$ the growth rate or fitness of variant $i$.

The model is fit to minimize "log loss" of predicted variant vs observed variant across observations in dataset.

### MLR implemented in evofr package

 location variant date        sequences
Japan    22B     2023-02-10  242
Japan    22B     2023-02-11  56
Japan    22B     2023-02-12  70
Japan    22E     2023-02-10  80
Japan    22E     2023-02-11  21
Japan    22E     2023-02-12  27
USA      22B     2023-02-10  41
USA      22B     2023-02-11  23
USA      22B     2023-02-12  23
USA      22E     2023-02-10  368
USA      22E     2023-02-11  236
USA      22E     2023-02-12  246
...


# Evolution driving epidemics

Data from UKHSA

### ONS Infection Survey provides rare source of ground truth

Roughly 1 in 3 infections detected in 2021, while 1 in 40 in 2023

Data from ONS

### ~110% population attack rate from March 2022 to March 2023

Data from UKHSA and ONS

### Post-Omicron period shows consistent IFR of 0.04%

Data from UKHSA and ONS

# Forecasting

### Hierarchical MLR model

The hierarchical model allows pooling of growth advantages across locations. This allows us to include locations with fewer sequences and to better estimate growth advantage of rare lineages.

Japan $p_{23A}$ $p_{23B}$ $f_{23A}$ $f_{23B}$
USA $p_{23A}$ $p_{23B}$ $f_{23A}$ $f_{23B}$
hierarchical $f_{23A}$ $f_{23B}$

### Could we predict the spread of new mutations using DMS data?

Escape from antibodies that potently neutralize BA.2

### Continued research

• Application of MLR models to other pathogens, such as seasonal influenza
• Assessing and improving accuracy of "live" models at nextstrain.org/sars-cov-2/forecasts/
• Implementing DMS priors to predict fitness of emerging and yet-to-emerge lineages

### Acknowledgements

SARS-CoV-2 genomic epi: Data producers from all over the world, GISAID and the Nextstrain team

Bedford Lab: John Huddleston, James Hadfield, Katie Kistler, Thomas Sibley, Jover Lee, Cassia Wagner, Miguel Paredes, Nicola Müller, Marlin Figgins, Victor Lin, Jennifer Chang, Allison Li, Eslam Abousamra, Donna Modrell, Nashwa Ahmed, Cécile Tran Kiem