Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
3 Mar 2026
VRC Tuesday Seminar
NIH Vaccine Research Center
Slides at: bedford.io/talks
Simple haploid population genetic model is equivalent to statistical multinomial logistic regression
The frequency $x_i(t)$ of variant $i$ at time $t$ is determined by its initial frequency $p_i$ along with its fitness $f_i$ following
$$x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_j p_j \, \mathrm{exp}(f_j \, t) }$$
Retrospective projections twice monthly during 2022
30 days out, countries range from 5 to 15% mean absolute error
Correlates with data availability (median number of sequences available from the previous 30 days):
BA.3 has been slowly growing, though does not yet have much of a toehold in the US
With variant frequency $x_i(t)$ and constant variant fitness $f_i$
Mean population fitness $\bar{f}(t) = \sum_i x_i(t) \, f_i$ Fitness flux $\phi(t) = \Delta \bar{f}(t) / \Delta t$
Constant clade fitness within each window, USA data only, ignores within-clade fitness variation
Initially doubled in fitness every year, but has slowed since 2024
Line thickness is proportional to variant frequency, 44 total variants
Constant clade fitness within each window, USA data only, ignores within-clade fitness variation
Line thickness is proportional to variant frequency, 32 total variants
Richard Neher and others have analytically characterized these waves
Diffusion constant $D = \mu \, \langle \delta^2 \rangle/2$, where the average $\langle \ldots \rangle$ is over the distribution of mutational effects $K(\delta)$
"The rate of increase in fitness of any organism at any time is equal to
its genetic variance in fitness at that time," ie
$$\frac{d\bar{f}}{dt} = Var(f)$$
Expand to 367 Pango lineages with at least 1000 sequence counts in the US from 2020 to 2025
Similar concept to Obermeyer et al
In this case, a very simple model of mutation counts
Note also the importance of properly assessing independent parent/child lineage deltas
But even these DMS measurements have short time horizons
Seasonal influenza and SARS-CoV-2 genomics: Data producers from all over the world, GISAID
Nextstrain: Richard Neher, Ivan Aksamentov, John SJ Anderson, Kim Andrews, Jennifer Chang, James Hadfield, Emma Hodcroft, John Huddleston, Jover Lee, Victor Lin, Cornelius Roemer, Thomas Sibley
MLR and fitness modeling: Marlin Figgins, Eslam Abousamra, Jover Lee, James Hadfield, John Huddleston, Philippa Steinberg, Jesse Bloom, Cornelius Roemer, Richard Neher
Bedford Lab:
John Huddleston,  
James Hadfield,  
Katie Kistler,  
Jover Lee,  
Marlin Figgins,  
Victor Lin,  
Nashwa Ahmed,  
Cécile Tran Kiem,  
Philippa Steinberg,  
Jacob Dodds,  
Amin Bemanian,  
Carlos Avendano,  
Aayush Verma