Data rich phylodynamics


Trevor Bedford

Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
31 Jan 2024
Science Meeting
Slides at:

Genomic epidemiology during the COVID-19 pandemic

Over 16M SARS-CoV-2 genomes shared to GISAID and evolution tracked in real-time at

Richard Neher, Ivan Aksamentov, Kim Andrews, Jennifer Chang James Hadfield, Emma Hodcroft, John Huddleston, Jover Lee, Victor Lin, Cornelius Roemer, Thomas Sibley

Three key insights that genomic epi provided during pandemic

  1. Rapid human-to-human spread in Wuhan beyond initial market outbreak
  2. Extensive local transmission while testing was rare
  3. Identification of variants of concern and mapping of increased transmission rates

Jan 19: First 12 genomes from Wuhan (blue) and Bangkok (red) showed lack of genetic diversity

Data from CAMS, China CDC, Fudan University, Hubei CDC, Thai MOPH, WIV; Figure from

Jan 23: Introduction into the human population between Nov 15 and Dec 15 and subsequent rapid human-to-human spread

Epidemic in the USA was introduced from China in late Jan and from Europe during Feb

Early sequencing provided best estimate of extent of local outbreak

After initial wave, with mitigation
efforts and decreased travel,
regional clades emerge

Emergence and spread of initial VOC viruses

Described in Rambaut et al. 2020. Figure from
Described in Faria et al. 2021. Figure from

Continued evolution post-Omicron

Future of genomic epidemiology

The COVID-19 pandemic has pushed the field perhaps ~5 years into the future

2013-16 Ebola in West Africa 29k confirmed cases 1610 genomes
2015-17 Zika in the Americas 223k confirmed cases 942 genomes
2018-19 seasonal flu in US 290k confirmed cases 8864 genomes
2020-22 COVID-19 pandemic 732M confirmed cases 14.5M genomes

Scalable approaches to genomic epidemiology

  1. Genomic outbreak investigation: across pathogens with focus on spatial dynamics and recontructing spread
  2. Evolutionary forecasting: focus on seasonal influenza and SARS-CoV-2 for impact on vaccine strain selection

Genomic outbreak investigation

Phylogeography infers migration matrix via phylogeny

Phylogeography infers migration matrix via phylogeny

However, this approach faces significant issues with scalability and sampling bias

Model emergence, spread and dissolution of clusters of identical sequences

One mutation every ~13 days vs duration of infection of ~5 days

Tran Kiem et al.

Spatial and social determinants of transmission from clusters of identical sequences

114k SARS-CoV-2 genomes from Washington State sentinel surveillance annotated with
geographic location and age

Tran Kiem et al.

Pairs of identical sequences 5.1 times more likely to come from same county

Tran Kiem et al.

Between county enrichment corresponds well with geography

Tran Kiem et al.

Between age group enrichment shows social mixing patterns

Tran Kiem et al.

Signal for differential local vs long-distance age mixing

Tran Kiem et al.

Evolutionary Forecasting

Recent advances

  1. Granular nomenclature and rapid classification
  2. Detailed frequency data allowing multinomial logistic regression

Currently 2857 Pango lineages and samples can be rapidly assigned lineages in Nextclade or UShER

Variants that are just 1-2 mutations different will get a label

Multinomial logistic regression

Multinomial logistic regression across $n$ variants models the frequency $x$ of variant $i$ at time $t$ as

$$x_i(t) = \frac{p_i \, \mathrm{exp}(f_i \, t)}{\sum_{1 \le j \le n} p_j \, \mathrm{exp}(f_j \, t) }$$

estimating parameters for the initial frequency $p$ and the growth rate or fitness $f$.

Provide continuously updated clade and lineage forecasts

The global sweep of JN.1 is now largely complete

Growth advantage readily identifiable while lineage is still rare

Growth advantage readily identifiable while lineage is still rare

Ongoing work to lengthen prediction horizon by incorporating high-throughput experimental measurements of ACE2 binding and immune escape

Lay ground-work for near ubiquitous pathogen sequencing


SARS-CoV-2 genomic epi: Data producers from all over the world and GISAID

Nextstrain: Richard Neher, Ivan Aksamentov, Kim Andrews, Jennifer Chang, James Hadfield, Emma Hodcroft, John Huddleston, Jover Lee, Victor Lin, Cornelius Roemer, Thomas Sibley

Determinants of transmission: Cécile Tran Kiem, Amanda Perofsky, Miguel Paredes, Lauren Frisbie, Allison Black, Cécile Viboud

Evolutionary forecasting: Marlin Figgins, Jover Lee, James Hadfield, John Huddleston, Cornelius Roemer, Richard Neher

Bedford Lab: John Huddleston, James Hadfield, Katie Kistler, Thomas Sibley, Jover Lee, Cassia Wagner, Miguel Paredes, Nicola Müller, Marlin Figgins, Victor Lin, Jennifer Chang, Eslam Abousamra, Nashwa Ahmed, Cécile Tran Kiem, Kim Andrews