Fred Hutchinson Cancer Center / Howard Hughes Medical Institute

14 Dec 2023

Division of Viral Products Seminar Series

FDA

- Apply fitness models and multinomial logistic regression models to predict changes in viral variant frequencies
- Assess evolutionary forecasts for retrospective accuracy
- Intrepret differences in evolutionary rate between seasonal influenza and SARS-CoV-2

Which factors contributed to the observed high yearly attack rate of SARS-CoV-2 in 2022 and 2023?

- Intrinsic $R_0$ of SARS-CoV-2
- Rapid antigenic evolution of SARS-CoV-2 spike protein
- SARS-CoV-2 is a pandemic virus rather than epidemic virus

Project to provide a real-time view of the evolving influenza population

All in collaboration with Richard Neher

Real-time tracking of pathogen evolution

*
Richard Neher,
Ivan Aksamentov,
Jennifer Chang
James Hadfield,
Emma Hodcroft,
John Huddleston,
Jover Lee,
Victor Lin,
Cornelius Roemer,
Thomas Sibley
*

Future frequency $x_i(t+\Delta t)$ of strain $i$ derives from strain fitness $f_i$ and present day frequency $x_i(t)$, such that

$$x_i(t+\Delta t) = \frac{1}{Z(t)} \, x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

Strain frequencies at each timepoint are normalized by total frequency $Z(t)$. This captures clonal interference between competing lineages.

*
with John Huddleston, Richard Neher, Dave Wentworth, Becky Kondor, John McCauley, Hideki Hasegawa, Kanta Subbarao and others
*

- Estimate of present-day strain frequencies $x_i(t)$
- Estimate of present-day strain fitnesses $f_i$

The fitness $f$ of strain $i$ is estimated as

$$f_i = \beta^\mathrm{A} \, f_i^\mathrm{A} + \beta^\mathrm{B} \, f_i^\mathrm{B} + \ldots$$

where $f^A$, $f^B$, etc... are different standardized viral attributes and $\beta^A$, $\beta^B$, etc... coefficients are trained based on historical evolution

Antigenic drift | Intrinsic fitness | Recent growth |
---|---|---|

epitope mutations | non-epitope mutations | local branching index |

HI titers | DMS data (via Bloom lab) | delta frequency |

Poor fit

Good fit

- We swapped from assessing clade frequencies to earth mover's distance because our clade assignments were not stable across trees built at different timepoints, while clade frequencies are the more natural metric.
- Strain fitness $f_i$ is largely fixed by the "fundamentals" of the strain rather than being learned from frequency behavior.

SARS-CoV-2

Influenza H3N2

$x' = \frac{x \, (1+s)}{x \, (1+s) + (1-x)}$ for frequency $x$ over one generation with selective advantage $s$

$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1-x_0)}$ for initial frequency $x_0$ over $t$ generations

Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1 - x})$

Multinomial logistic regression across $n$ variants models the probability of a virus sampled at time $t$ belonging to variant $i$ as

$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$

with $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint and $f_i$ the growth rate or fitness of variant $i$.

Model from Figgins and Bedford. 2022. medRxiv.

Data from UKHSA

Roughly 1 in 3 infections detected in 2021, while 1 in 40 in 2023

Data from ONS

Model from Figgins and Bedford. 2022. medRxiv.

Retrospective projections twice monthly during 2022

30 days out, countries range from 6 to 10% mean absolute error

Escape from antibodies that potently neutralize BA.2

- Application of MLR models to seasonal influenza and other pathogens
- Assessing and improving accuracy of "live" models at nextstrain.org/sars-cov-2/forecasts/
- Implementing DMS priors to predict fitness of emerging and yet-to-emerge lineages

**Flu**: WHO Global Influenza Surveillance and Response System, other data producers, GISAID, John Huddleston, Richard Neher, Jennifer Chang, Jover Lee

**SARS-CoV-2**: Data producers from all over the world, GISAID, the Nextstrain team, Katie Kistler, Marlin Figgins, Eslam Abousamra, Jover Lee, James Hadfield

**Bedford Lab**:
John Huddleston,
James Hadfield,
Katie Kistler,
Thomas Sibley,
Jover Lee,
Cassia Wagner,
Miguel Paredes,
Nicola Müller,
Marlin Figgins,
Victor Lin,
Jennifer Chang,
Allison Li,
Eslam Abousamra,
Donna Modrell,
Nashwa Ahmed,
Cécile Tran Kiem

Which factors contributed to the observed high yearly attack rate of SARS-CoV-2 in 2022 and 2023?

- Intrinsic $R_0$ of SARS-CoV-2
- Rapid antigenic evolution of SARS-CoV-2 spike protein
- SARS-CoV-2 is a pandemic virus rather than epidemic virus