Pathogen phylogenetics for decision making


Trevor Bedford (@trvrb)
Fred Hutchinson Cancer Center / Howard Hughes Medical Institute
23 Aug 2022
From Trees to Public Health Policy Module
International Bioinformatics Workshop on Virus Evolution and Molecular Epidemiology
Slides at:

Sequencing to reconstruct pathogen spread

Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

Utility of genomic epidemiology

Pathogen genomes may reveal

  • Evolution of new adaptive variants
  • Epidemic origins
  • Patterns of geographic spread
  • Animal-to-human spillover
  • Transmission chains

Influenza: Forecasting spread of new variants for vaccine strain selection

Zika: Uncovering origins of the epidemic in the Americas

Ebola: Revealing spatial spread and persistence in West Africa

MERS: Repeated spillover into the human population from camel reservoir

TB: Tracking individual transmission chains

Genomic epidemiology during the COVID-19 pandemic

Wuhan emergence and human-to-human spread

Jan 11: First five genomes from Wuhan showed a novel SARS-like coronavirus

Initially thought clustering due to epi investigation of linked cases at Huanan seafood market

Data from CAMS, China CDC, Fudan University, WIV; Figure from

Jan 19: First 12 genomes from Wuhan (blue) and Bangkok (red) showed lack of genetic diversity

Data from CAMS, China CDC, Fudan University, Hubei CDC, Thai MOPH, WIV; Figure from

Jan 23: Introduction into the human population between Nov 15 and Dec 15 and subsequent rapid human-to-human spread

Jan 23: Email blast to colleagues at PHSKC, WA DOH, CDC, BMGF, NIH, UW, Fred Hutch

Jan 26: Media reporting based on technical report and interviews

Ongoing tracking of genomic data via Nextstrain

  • Phylogenetic analysis at up since Jan 19, 2020
  • During this time, watching GISAID and attempting immediate updates as new data appeared
  • Update process became increasing automated over the subsequent weeks and months

Feb 14: AAAS meeting and BMGF dinner

Following Tufte's advice, I printed out a handout

In retrospect, how could have the initial messaging of pandemic risk been improved?

Cryptic transmission and testing in Seattle

  Seattle Flu Study

Project initiated in 2018-2019 season and continued into the 2019-2020 flu season

Lead investigators: Helen Chu, Michael Boeckh, Janet Englund, Michael Famulare, Barry Lutz, Deborah Nickerson, Mark Rieder, Lea Starita, Matthew Thompson, Trevor Bedford, Jay Shendure

Co-investigators: Amanda Adler, Jeris Bosua, Elisabeth Brandstetter, Kairsten Fay, Chris Frazar, Peter Han, Reena Gulati, James Hadfield, ShiChu Huang, Misja Ilcisin, Michael Jackson, Anahita Kiavand, Louise Kimball, Enos Kline, Kirsten Lacombe, Jover Lee, Jennifer Logue, Victoria Lyon, Kira Newman, Miguel Paredes, Thomas Sibley, Monica Zigman Suchsland, Cassia Wagner, Caitlin Wolf


Feb 2020: Struggle to test samples that were in hand

We started testing samples on Tue Feb 24 with capacity for ~400 tests a day and find the first positive on Thur Feb 27

Sequencing this positive showed surprising connection

Calls with Mayor Durkan and Governor Inslee, which focus on understanding results and modeled expectations for epidemic spread

Screening of acute respiratory infections for SARS-CoV-2

Sequencing of viruses collected prior to March 15 detects origins and rate of local spread

Sequencing of viruses collected prior to March 15 detects origins and rate of local spread

Rare introduction from China that spread widely, most US epidemic arrived via Europe

How do you balance uncertainty with consequence in alerting?

Continued public communication

Engagement with scientists, public health, policy makers and public through Twitter

  • I discover a strategy in which I can dive deeply into a topic / question, present results on Twitter and then take interviews / meetings from reporters / colleagues
  • Multiple reporters can pull quote from Twitter rather than having to do repeated interviews

Broadly, I make it my goal to help public and policy makers understand what's happening with the pandemic. Although there are practical applications of genomic epi and modeling for specific pathogens, I think that understanding is really what these approaches offer more broadly.

I've had many conversations over the course of the pandemic, but I don't think there's been any fundamental difference between conversations with reporters, policy makers or friends and family. In each case, I'm trying to convey my understanding of the world and uncertainty of this understanding in a fashion that's comprehensible.

I believe public health messaging has been repeatedly scientifically subverted with messaging for intended behavior. This was seen with messaging over masks, airborne transmission, natural immunity, B.1.1.7 causing more severe illness, etc...

Phylogenetics or genomic epi, by itself, is just one avenue towards this sort of understanding, and should be combined with other sources to model / understand what's going on.

Should pandemic communications seek to optimize lives saved or scientific accuracy?

Emergence of variants of concern

After initial wave, with mitigation
efforts and decreased travel,
regional clades emerge

Repeated emergence of 484K and 501Y across the world

Emergence of Alpha (B.1.1.7) in the UK

Alpha described in Rambaut et al. 2020. Figure from

Emergence of Beta (B.1.351) in the South Africa

Beta described in Tegally et al. 2021. Nature. Figure from

Emergence of Gamma (P.1) in the Brazil

Gamma described in Faria et al. 2021. Science. Figure from

Lobbying for improved genomic surveillance

Disappointing that focus has been on within-country sequencing rather than global surveillance

Understanding characteristics and origins of variant viruses

Increasingly, focus on tracking variant spread and estimating growth rates

Differences in intrinsic Rt across variants, but all trending downwards

Consistent differences in variant-specific transmission rate across states

And general understanding of adaptive evolution in SARS-CoV-2

Multiple experts told the public "we shouldn’t worry when a virus mutates" in 2020. In hindsight, what should we have expected and conveyed for SARS-CoV-2 prior to observation of initial VOCs?

Emergence of Omicron variant

Nov 26: Lineage B.1.1.539 / clade 21K / Omicron variant emerging from basal diversity

Omicron described in Viana et al. 2022. Nature. Figure from

Nov 26: Long branch connecting closest sequenced viruses

Omicron described in Viana et al. 2022. Nature. Figure from

Nov 26: Omicron viruses with huge excess of mutations in S1

Omicron described in Viana et al. 2022. Nature. Figure from

Dec 4: Projections from rapid epidemic spread in South Africa

Dec 4: Projections from rapid epidemic spread in South Africa

Dec 16: Warning of large incipient Omicron epidemics

Warning public health, policy makers and the public of timing and intensity of incoming Omicron wave.

Given decrease in individual-level severity, policy maker worry primarily concerned hospital capacity

In hindsight, what would have been appropriate local policy response to Jan/Feb Omicron wave, ie enormous burden of infections, but decreased per-infection risk? Let's ignore the central issue with travel bans for a moment.

Ongoing evolution of SARS-CoV-2

Aug 2022: Relationships of globally sampled SARS-CoV-2

Rapid displacement of existing diversity by emerging variants

S1 evolved at a rate of 12 amino acid changes per year since pandemic start

Continued rapid accumulation since BA.2 with 7 amino acid changes per year

S1 evolution remarkably fast relative to seasonal influenza

Continued escape from neutralization by existing population immunity

How do we balance need for timeliness vs clinical trials for vaccine updating?


  • Phylogenetics and genomic epidemiology most impactful early on, when case-based surveillance is poor
  • Phylogenetic analysis should be combined with other sources of knowledge
  • I believe in transparency and public sharing of scientific results / understanding, even if the primary audience is other scientists
  • My (admittedly) ivory tower perspective highlights the critical need to preserve scientific accuracy over falling prey to well meaning propaganda


SARS-CoV-2 genomic epi: Data producers from all over the world, GISAID and the Nextstrain team

Bedford Lab: John Huddleston, James Hadfield, Katie Kistler, Louise Moncla, Maya Lewinsohn, Thomas Sibley, Jover Lee, Cassia Wagner, Miguel Paredes, Nicola Müller, Marlin Figgins, Denisse Sequeira, Victor Lin, Jennifer Chang