Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
12 Sep 2024
Cobey Lab Seminar
University of Chicago
Slides at: bedford.io/talks
Initially thought clustering due to epi investigation of linked cases at Huanan seafood market
2013-16 Ebola in West Africa | 29k confirmed cases | 1610 genomes |
2015-17 Zika in the Americas | 223k confirmed cases | 942 genomes |
2018-19 seasonal flu in US | 290k confirmed cases | 8864 genomes |
2020-22 COVID-19 pandemic | 732M confirmed cases | 14.5M genomes |
However, this approach faces significant issues with scalability and sampling bias
One mutation every ~13 days vs duration of infection of ~5 days
114k SARS-CoV-2 genomes from Washington State sentinel surveillance annotated with
geographic location and age
$x' = \frac{x \, (1+s)}{x \, (1+s) + (1-x)}$ for frequency $x$ over one generation with selective advantage $s$
$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1-x_0)}$ for initial frequency $x_0$ over $t$ generations
Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1 - x})$
Multinomial logistic regression across $n$ variants models the probability of a virus sampled at time $t$ belonging to variant $i$ as
$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$
with $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint and $f_i$ the growth rate or fitness of variant $i$.
Retrospective projections twice monthly during 2022
30 days out, countries range from 5 to 15% mean absolute error
This approach improves poor model accuracy in countries with less intensive genomic surveillance
Rapid sweep of JN.1 over Dec to Jan 2024
Serial replacement of descendants of JN.1 with KP.3 the current winner
Rather than estimate variant specific fitness $f_i$ directly, we instead parameterize as the "innovation" in fitness in going from parent lineage $p$ to child lineage $i$ as $\psi_i = (f_i - f_p)$.
We then compare a non-informative model of $$\psi_i = (f_i - f_p) \sim \mathrm{Normal}(0, \sigma)$$ to a model where each "innovation" value has an informed prior based on a linear combination of predictors such as ACE2 binding, immune escape and S1 mutations $$\psi_i = (f_i - f_p) \sim \mathrm{Normal}\left(\sum_p \beta_p \, x_p, \sigma\right)$$
SARS-CoV-2 genomic epi: Data producers from all over the world, GISAID, UW Virology, BBI, WA PHL
Nextstrain: Richard Neher, Ivan Aksamentov, John SJ Anderson, Kim Andrews, Jennifer Chang, James Hadfield, Emma Hodcroft, John Huddleston, Jover Lee, Victor Lin, Cornelius Roemer, Thomas Sibley
Determinants of transmission: Cécile Tran Kiem, Amanda Perofsky, Miguel Paredes, Lauren Frisbie, Allison Black, Cécile Viboud
Evolutionary forecasting: Marlin Figgins, Jover Lee, James Hadfield, John Huddleston, Eslam Abousamra, Jesse Bloom, Cornelius Roemer, Richard Neher
Bedford Lab: John Huddleston,   James Hadfield,   Katie Kistler,   Thomas Sibley,   Jover Lee,   Miguel Paredes,   Marlin Figgins,   Victor Lin,   Jennifer Chang,   Nashwa Ahmed,   Cécile Tran Kiem,   Kim Andrews,   Cristian Ovaduic,   Philippa Steinberg,   Jacob Dodds,   John SJ Anderson   Amin Bemanian