Genomic tracking of SARS-CoV-2 evolution and spread


Trevor Bedford (@trvrb)
Associate Professor, Fred Hutchinson Cancer Research Center
22 Jul 2020
AACR Virtual Meeting: COVID-19 and Cancer
Slides at:

Disclosure Information

AACR Virtual Meeting: COVID-19 and Cancer, Trevor Bedford

I have the following financial relationships to disclose:

  • Grant/Research support from: NIH, Pew Charitable Trusts, Wellcome Trust, HHMI, BMGF

I will not discuss off label use and/or investigational use in my presentation.

Significant fog of war. Genomic approaches offer orthogonal data source to understand the pandemic.

Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

Jan 11: First five genomes showed that the outbreak was caused by a novel SARS-like coronavirus

Jan 19: First 12 genomes from Wuhan and Bangkok lack genetic diversity

Single introduction into the human population between Nov 15 and Dec 15 and human-to-human epidemic spread from this point forward


Spent the week of Jan 20 alerting public health officials, and since then have aimed to keep up-to-date


Project to conduct real-time genomic epidemiology and evolutionary analysis of emerging epidemics

with Richard Neher, James Hadfield, Emma Hodcroft, Thomas Sibley, John Huddleston, Louise Moncla, Cassia Wagner, Miguel Parades, Misja Ilcisin, Kairsten Fay, Jover Lee, Allison Black, Colin Megill, Sidney Bell, Barney Potter, Charlton Callender

Nextstrain architecture

All code open source at

Two central aims: (1) rapid and flexible phylodynamic analysis and
(2) interactive visualization

Rapid build pipeline for 3000 SARS-CoV-2 genomes (timings are for a laptop)

  • Align with MAFFT (~20 min)
  • Build ML tree with IQTREE (~40 min)
  • Temporally resolve tree and geographic ancestry with TreeTime (~50 min)
  • Total pipeline (~2 hr)

Current data flow for SARS-CoV-2

  1. Labs contribute directly to GISAID (now have >63k full genomes)
  2. Nextstrain pulls a complete dataset from GISAID every 24 hours
  3. This triggers an automatic rebuild on Amazon Web Services
  4. We manually update new lat/longs, etc...
  5. We push this build online to and tweet the update from @nextstrain

Dec/Jan: Emergence from Wuhan in ~Nov 2019

Jan/Feb: Spread within China and seeding elsewhere

Feb/Mar: Epidemic spread within North America and Europe

Mar/Apr: Decreasing transmission with social distancing

Epidemic in the USA was introduced from China in late Jan and from Europe during Feb

Once in the US, virus spread rapidly

Single introduction at the beginning of Feb quickly shows up throughout the country

Sequencing immediately useful for epidemiological understanding, but selection and functional impacts should also be studied

Significant interest in spike mutation D614G

This mutation occurred in the initial European introduction

D614G is prevalent throughout Europe and mixed in US and Australia

D614G is increasing in frequency across states in US and Australia

D614G is increasing in frequency across states in US and Australia

The success of D614G can be explained by either:

  • D614G is more transmissible and has higher $R_0$
  • founder effects and epidemiological confounding

Additional evidence from Ct values of clinical specimens

Sheffield, UK Seattle, USA

Further phylodynamic models

UK Washington State

Moving forward

  • Better methods for large datasets
  • Distinguishing endogenous spread from importations
  • Tying genomic epidemiology together with richer epi data to better understand local transmission
  • Incorporating within-host variation to improve phylogenetic resolution
  • Integrating clinical data to look for mutations that impact clinical outcomes


Genomic epi: Data producers from all over the world, GISAID and the Nextstrain team

Bedford Lab: Alli Black, John Huddleston, James Hadfield, Katie Kistler, Louise Moncla, Maya Lewinsohn, Thomas Sibley, Jover Lee, Kairsten Fay, Misja Ilcisin, Cassia Wagner, Miguel Parades, Nicola Müller, Marlin Figgins, Eli Harkins