Tracking and forecasting epidemic spread through viral genome sequencing


Trevor Bedford (@trvrb)
5 Feb 2020
CCB Seminar Series
UC Berkeley
Slides at:

We work at the interface of virology, evolution and epidemiology

Sequencing to reconstruct pathogen spread

Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

Localized Middle Eastern MERS-CoV phylogeny

Regional West African Ebola phylogeny

Global influenza phylogeny

Phylogenetic tracking has the capacity to revolutionize epidemiology


  • Analysis of Ebola epidemic spread in West Africa
  • Nextstrain platform for real-time phylodynamics
  • Actionable genomic epidemiology for Ebola in the DRC
  • Novel coronavirus(!)


Ebola epidemic of 2014-2016 was unprecedented in scope

Ebola epidemic in West Africa

Ebola epidemic within Sierra Leone

Virus genomes reveal factors that spread and sustained the Ebola epidemic

with Gytis Dudas, Andrew Rambaut, Luiz Carvalho, Marc Suchard, Philippe Lemey,
and many others

Sequencing of 1610 Ebola virus genomes collected during the 2013-2016 West African epidemic

Sequenced genomes were representative of spatiotemporal diversity

Phylogenetic reconstruction of epidemic

Tracking migration events

Factors influencing migration rates

Effect of borders on migration rates

Spatial structure at the country level

Substantial mixing at the regional level

Each introduction results in a minor outbreak

Regional outbreaks due to multiple introductions

Regional outbreaks due to multiple introductions

Ebola spread in West Africa followed a gravity model with moderate slowing by international borders, in which spread is driven by short-lived migratory clusters

Actionable inferences

Genomic analyses were mostly done in a retrospective manner

Dudas and Rambaut 2016

Key challenges to making genomic epidemiology actionable

  • Timely analysis and sharing of results critical
  • Dissemination must be scalable
  • Integrate many data sources
  • Results must be easily interpretable and queryable


Project to conduct real-time molecular epidemiology and evolutionary analysis of emerging epidemics

with Richard Neher, James Hadfield, Emma Hodcroft, Thomas Sibley, John Huddleston, Louise Moncla, Misja Ilcisin, Kairsten Fay, Jover Lee, Allison Black, Colin Megill, Sidney Bell, Barney Potter, Charlton Callender

Nextstrain architecture

All code open source at

Two central aims: (1) rapid and flexible phylodynamic analysis and
(2) interactive visualization

Rapid build pipeline for 1600 Ebola genomes

  • Align with MAFFT (34 min)
  • Build ML tree with RAxML (54 min)
  • Temporally resolve tree and geographic ancestry with TreeTime (16 min)
  • Total pipeline (1 hr 46 min)

Flexible pipelines constructed through command line modules

  • Modules called via augur filter, augur tree, augur traits, etc...
  • Designed to be composable across pathogen builds
  • Defined pipeline, making steps obvious
  • Provides dependency graph for fast recomputation
  • Pathogen-specific repos give users an obvious foundation to build from

Nextstrain is two things

  • a bioinformatics toolkit and visualization app, which can be used for a broad range of datasets
  • a collection of real-time pathogen analyses kept up-to-date on the website

Rapid on-the-ground sequencing in Makeni, Sierra Leone

"Community" builds to promote frictionless sharing of results

  • Attempting to write us out of the picture
  • JSON outputs uploaded to, would be available at
  • Used now for a variety of pathogens including Lassa in Nigeria, global RSV and cassava virus

Ongoing DRC outbreak

Genomic epidemiology applied to North Kivu Ebola outbreak

with Placide Mbala-Kingebeni, Eddy Kinganda Lusamaki, Catherine Pratt, Mike Wiley, James Hadfield, Allison Black, Jean-Jacques Muyembe Tamfum, Steve Ahuka-Mundeke, Daniel Mukadi, Gustavo Palacios, Amadou Sall, Ousmane Faye, Eric Delaporte, Martine Peeters and many others

Nextstrain is being used to track North Kivu outbreak

Allison Black (PhD student) and James Hadfield (postdoc) working with scientists at the INRB. Goal is to provide training in bioinformatics, Nextstrain and genomic epidemiology.

View of current genomic data

Current dataset:

  • 569 full genomes sequenced (~17% of confirmed cases)
  • Most recent sequenced virus collected Jan 13 (~3 weeks ago)

Often (but not always) need specific actionable pieces of information rather than large-scale understanding. For Ebola outbreak response, I believe this needs to revolve around contact tracing.

Superspreader event in June

Using narratives to walk through specific transmission inferences

  • We're rolling out new narratives functionality in Nextstrain
  • These are Markdown posts that allow you to pair narrative text to visualization state
  • Made possible through an early decision to embed visualization state in URL
  • Example narrative for Ebola in the DRC here:

Novel coronavirus (nCoV)

Jan 10: nCoV is a betacoronavirus

Jan 10: nCoV is a betacoronavirus

Jan 11: And belongs to SARS-like coronaviruses

Jan 11: And belongs to SARS-like coronaviruses

Jan 11: These viruses have a natural reservoir in bats

Jan 11: Initial 5 nCoV genomes from Wuhan showed highly restricted genetic diversity

Initially thought clustering due to epi investigation of linked cases at Huanan seafood market

Jan 17: Additional 2 nCoV genomes from Thailand travel cases also lacked diversity

Jan 19: Additional 5 nCoV genomes from Wuhan still lacked diversity


Single introduction into the human population between Nov 15 and Dec 15 and human-to-human epidemic spread from this point forward

Spent the week of Jan 20 alerting public health officials, and since then have aimed to keep updated within ~1hr of new sequences being deposited

Data sharing through GISAID

  • Shanghai Public Health Clinical Center, Fudan University, Shanghai, China
  • National Institute for Viral Disease Control and Prevention, China CDC, Beijing, China
  • Institute of Pathogen Biology, Chinese Academy of Medical Sciences, Beijing, China
  • Wuhan Institute of Virology, Chinese Academy of Sciences, Wuhan, China
  • Department of Microbiology, Zhejiang Provincial CDC, Hangzhou, China
  • Guangdong Provincial CDC, Guangzhou, China
  • Shenzhen Key Laboratory of Pathogen and Immunity, Shenzhen, China
  • Hangzhou Center for Disease and Control Microbiology Lab, Zhejiang, China
  • National Institute of Health, Nonthaburi, Thailand

Data sharing through GISAID (continued)

  • National Institute of Infectious Diseases, Tokyo, Japan
  • Korea Centers for Disease Control & Prevention, Cheongju, Korea
  • National Public Health Laboratory, Singapore
  • US Centers for Disease Control and Prevention, Atlanta, USA
  • Institut Pasteur, Paris, France
  • Respiratory Virus Unit, Microbiology Services Colindale, Public Health England
  • Department of Virology, University of Helsinki and Helsinki University Hospital, Helsinki, Finland
  • University of Melbourne, Peter Doherty Institute for Infection & Immunity, Melbourne, Australia
  • Victorian Infectious Disease Reference Laboratory, Melbourne, Australia

Almost real-time with many genomes shared within 3-6 days of sampling

Current state of with 54 genomes as of Feb 4

Providing updated genomic situation reports at

Scientific communication surrounding outbreak has completely flipped with everything posted to bioRxiv, modeling groups posting live analyses and crowd-sourced line lists 🙌🏻

This communication between academics and public health officials has spilled over with huge interest from general public

This is having knock-on effects on science communication and spread of misinformation

Moving forward

  • Epidemic spread still appears to be sustained with estimates of R0 between 1.5-3.5 and epidemic doubling time of ~6 days
  • We have not yet had time to ascertain effects of intervention measures, but my hope for containment is slim
  • Biggest question for me now surrounds infection-to-fatality ratio
  • I expect genomic data to be most immediately useful to help pin down emerging community transmission


Bedford Lab: Alli Black, John Huddleston, James Hadfield, Katie Kistler, Louise Moncla, Maya Lewinsohn, Thomas Sibley, Jover Lee, Kairsten Fay, Misja Ilcisin, Nicola Müller, Marlin Figgins

Ebola in West Africa: Gytis Dudas, Andrew Rambaut, Luiz Carvalho, Philippe Lemey, Marc Suchard, Andrew Tatem   Nextstrain: Richard Neher, James Hadfield, Emma Hodcroft, Tom Sibley, John Huddleston, Sidney Bell, Barney Potter, Colin Megill, Charlton Callender   Ebola in DRC: James Hadfield, Allison Black, Eddy Kinganda Lusamaki, Placide Mbala-Kingebeni, Catherine Pratt, Mike Wiley, Jean-Jacques Muyembe Tamfum, Steve Ahuka-Mundeke, Daniel Mukadi, Gustavo Palacios, Amadou Sall, Ousmane Faye, Eric Delaporte, Martine Peeters, David Blazes, Cecile Viboud, David Spiro   Novel coronavirus: data producers from all over the world, GISAID,