SARSCoV2 variant emergence and spread
Trevor Bedford (@trvrb)
Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
29 Aug 2022
SAVE Early Detection Meeting
NIH NIAID
1. SARSCoV2 evolution
2. SARSCoV2 variant frequency dynamics
3. BA.2.75 as case study
4. Emerging variants
5. Genomic surveillance
Genetic relationships of globally sampled SARSCoV2 to present
Rapid displacement of existing diversity by emerging variants
S1 evolved at a rate of 14 amino acid changes per year since pandemic start
Continued rapid accumulation since BA.2 with 8 amino acid changes per year
S1 evolution remarkably fast relative to seasonal influenza
Continued escape from neutralization by existing population immunity
SARSCoV2 variant frequency dynamics
Population genetic expectation of variant frequency under selection
$x' = \frac{x \, (1+s)}{x \, (1+s) + (1x)}$ for frequency $x$ over one generation with selective advantage $s$
$x(t) = \frac{x_0 \, (1+s)^t}{x_0 \, (1+s)^t + (1x_0)}$ for initial frequency $x_0$ over $t$ generations
Trajectories are linear once logit transformed via $\mathrm{log}(\frac{x}{1  x})$
Variants show consistent frequency dynamics in logit space
Variants show consistent frequency dynamics in logit space
Multinomial logistic regression
Multinomial logistic regression models the probability of a virus sampled at time $t$ belonging to
variant $i$ as
$$\mathrm{Pr}(X = i) = x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$
where the model has $2n$ parameters consisting of $p_i$ the frequency of variant $i$ at initial timepoint
and $f_i$ the growth rate or fitness of variant $i$ for $n$ variants.
The model is fit to minimize "log loss" of predicted variant vs observed variant across observations in
dataset.
Multinomial logistic regression fits variant frequencies well
Consistent fitness advantage of BA.5 across countries
Despite similar rates of displacement, BA.5 epidemics vary
Despite similar rates of displacement, BA.5 epidemics vary
These differences can be explained by consistent growth advantage of BA.5, but different baseline Rt across countries
Multinomial logistic regression should work well for SARSCoV2 prediction, except new variants have
been emerging fast enough that the prediction horizon is really quite short
BA.2.75 shows consistent selective advantage across countries
BA.2.75 did not drive large increase in total case loads in India
BA.2.75 current Rt greater than 1 in the US, but lower than initial Rt of BA.5
July 4: Initial frequency data from India were quite alarming
Differential submission lags don't explain discordance
July 4: Early data is geographically focused on Maharashtra
Aug 27: Further sequencing fills out geography
Watch lists for new Pango lineages rising quickly in frequency
Hierarchical MLR models should work well to regularize and pool data
However, mutational makeup did not predict Omicron fitness
Sequences generated and shared at an unprecedented pace with >5M in 2022 (compare to 8k flu genomes in 2022)
Data from gisaid.org
My favorite metric is number of sequences available from samples collected in the past 30 days
Data from gisaid.org
Largely consistent sequence counts for US and Europe, though recent decline in UK
Data from gisaid.org
Investments in genomic surveillance should focus on improving latency and throughput in LMICs
Data from gisaid.org
Genomic surveillance and modeling to track SARSCoV2 variants

Global genomic surveillance with rapid turnaround times essential, as this allows spotting variant
early and provides geographic replication

With detailed frequency data, MLR models work well to estimate local growth rates

Fitness estimates become more reliable with multiple geographies

Automated systems for nearterm fitness estimation and forecasting should be continued research priority
Acknowledgements
SARSCoV2 genomic epi: Data producers from all over the world, GISAID and the Nextstrain team
Bedford Lab:
John Huddleston,
James Hadfield,
Katie Kistler,
Louise Moncla,
Maya Lewinsohn,
Thomas Sibley,
Jover Lee,
Cassia Wagner,
Miguel Paredes,
Nicola MÃ¼ller,
Marlin Figgins,
Denisse Sequeira,
Victor Lin,
Jennifer Chang