Genomic epidemiology of SARS-CoV-2


Trevor Bedford (@trvrb)
Oct 28, 2021
MCB532 Human Pathogenic Viruses

Sequencing to reconstruct pathogen spread

Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

Genomic epidemiology during the COVID-19 pandemic

>4.6M SARS-CoV-2 genomes shared to GISAID with 630k genomes in Aug 2021 alone

Data from gisaid.org

Three key insights that genomic epi provided during pandemic and concepts that underlie these insights

  1. Rapid human-to-human spread in Wuhan beyond initial market outbreak (molecular clocks)
  2. Extensive local transmission while testing was rare (phylogeography)
  3. Identification of variants of concern and mapping of increased transmission rates (adaptive evolution)

Molecular clocks and early human-to-human spread

Mutations tend to accumulate in a clock-like fashion

"Root-to-tip" plots show temporal signal

Allows conversion between branch length and time

Jan 11: First five genomes from Wuhan showed a novel SARS-like coronavirus

Jan 19: First 12 genomes from Wuhan and Bangkok showed lack of genetic diversity

Single introduction into the human population between Nov 15 and Dec 15 and subsequent rapid human-to-human spread

Phylogeography and inferring local transmission

"Data" is a phylogeny and tip states

States include nucleotides, amino acids, geo locations, hosts, etc...

Model infers transition matrix and ancestral states

Rare transitions, short branches and many taxa increase confidence

Rapid global epidemic spread from China

Epidemic in the USA was introduced from China in late Jan and from Europe during Feb

Seattle Flu Study detects local circulation and charts early epidemic in Washington State

Direct introduction from China ~Feb 1 responsible for the majority of the epidemic

Mitigation efforts visible in virus genetic diversity

Similar stories from elsewhere in the world, particularly when genomic sequencing is strong relative to case-based surveillance

After initial wave, with mitigation
efforts and decreased travel,
regional clades emerge

Adaptive evolution and emergence and spread of variant of concern viruses

Mutations that increase viral fitness will increase in frequency

  1. This distorts coalescent branching patterns
  2. This increases observed rate of mutations at selected sites

Neutral population dynamics

Episodic positive selection shows selective sweeps

Spread of VOC / VOI lineages across the world

Delta is outcompeting other variants and is on track to sweep

Phylogeny of 10k genomes equitably sampled in space and time

Measure clade growth as a proxy for viral fitness

Clades with more S1 nonsynonymous mutations grow faster

S1 is quickly evolving and highly correlated with clade growth

dN/dS through time further highlights adaptive evolution

Rapid pace of adaptive evolution relative to H3N2 influenza

Huge opportunity to work directly from frequency trajectories

Differences in intrinsic Rt across variants, but all trending downwards

Figgins and Bedford. Unpublished.

Consistent differences in variant-specific transmission rate across states

Figgins and Bedford. Unpublished.

Future of genomic epidemiology

The COVID-19 pandemic has pushed the field perhaps ~5 years into the future


2013-16 Ebola in West Africa 29k confirmed cases 1610 genomes
2015-17 Zika in the Americas 223k confirmed cases 942 genomes
2018-19 seasonal flu in US 290k confirmed cases 8864 genomes
2020-21 COVID-19 pandemic 245M confirmed cases 4.6M genomes