Real-time tracking of virus evolution

Trevor Bedford (@trvrb)
27 Jan 2016
Combi Seminar
Genome Sciences, University of Washington

Slides at


Phylogenies describe history

Phylogenies describe history

Haeckel 1879

Phylogenies describe history

Phylogenies reveal process

Darwin 1859


Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

Middle Eastern MERS-CoV phylogeny

West African Ebola phylogeny

Global influenza phylogeny

Applications of evolutionary analysis for influenza vaccine strain selection and charting outbreak spread


Influenza virion

Influenza H3N2 vaccine updates

H3N2 phylogeny showing antigenic drift

H3N2 phylogeny showing antigenic drift

Drift variants rapidly take over the virus population

Timely surveillance and rapid analysis essential to understand ongoing influenza evolution


Project to provide a real-time view of the evolving influenza population

All in collaboration with Richard Neher

nextflu pipeline

  1. Download all recent HA sequences from GISAID
  2. Filter to remove outliers
  3. Subsample across time and space
  4. Align sequences
  5. Build tree
  6. Estimate frequencies
  7. Export for visualization

Up-to-date analysis publicly available at:

Antigenic evolution

Influenza hemagglutination inhibition (HI) assay

HI measures cross-reactivity across viruses

Data in the form of table of maximum inhibitory titers

Fit HI titer drops to phylogeny branches

Model is highly predictive of missing titer values

Recent HI data from WHO CC London annual and interim reports

Up-to-date analysis at:


The future is here, it's just not evenly distributed yet — William Gibson

USA music industry, 2011 dollars per capita

Influenza population turnover

Vaccine strain selection timeline

Seek to explain change in clade frequencies over 1 year


Fitness models can project clade frequencies

Clade frequencies $X$ derive from the fitnesses $f$ and frequencies $x$ of constituent viruses, such that

$$\hat{X}_v(t+\Delta t) = \sum_{i:v} x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

This captures clonal interference between competing lineages

Predictive fitness models

A simple predictive model estimates the fitness $f$ of virus $i$ as

$$\hat{f}_i = \beta^\mathrm{ep} \, f_i^\mathrm{ep} + \beta^\mathrm{ne} \, f_i^\mathrm{ne}$$

where $f_i^\mathrm{ep}$ measures cross-immunity via substitutions at epitope sites and $f_i^\mathrm{ep}$ measures mutational load via substitutions at non-epitope sites

We implement a similar model based on two predictors

  1. Clade frequency change
  2. Antigenic advancement

Project frequencies forward,
growing clades have high fitness

Calculate HI drop from ancestor,
drifted clades have high fitness

Fitness model parameterization

Our predictive model estimates the fitness $f$ of virus $i$ as

$$\hat{f}_i = \beta^\mathrm{freq} \, f_i^\mathrm{freq} + \beta^\mathrm{HI} \, f_i^\mathrm{HI}$$

We learn coefficients and validate model based on previous 15 H3N2 seasons

Clade growth rate is well predicted

Growth vs decline correct in 83% of cases

Clade error increases steadily over time

Trajectories show more detailed congruence

Formalizes intuition about drivers of influenza dynamics

Model Ep coefficient HI coefficient Freq error Growth corr
Epitope only 2.36 -- 0.10 0.57
HI only -- 2.05 0.08 0.63
Epitope + HI -0.11 2.15 0.08 0.67

Further work on predictive modeling

  1. Integrate data predictors and data sources, e.g. plan to investigate a geographic predictor
  2. Possible to build predictive models for H1N1 and B and to forecast NA evolution

Evolutionary analyses can inform influenza vaccine strain selection

Analyses must be rapid and widely available

Predictive models can flag clades for experimental follow-up and creation of vaccine candidates


Epidemic nearly contained, but resulted in >28,000 confirmed cases and >11,000 deaths

Outbreaks are independent spillovers from the animal reservoir

Person-to-person spread in the early West African outbreak

Continued spread through Dec 2014

At epidemic height, geographic spread of particular interest

Rambaut 2015

Later on, tracking transmission clusters of primary importance

Tracking epidemic spread in real-time:

Middle East respiratory syndrome coronavirus (MERS-CoV)

Cases concentrated in the Arabian Peninsula with occasional exports

No evidence of epidemic growth, spill-over transmission clusters

Bats ➞ Camels ➞ Humans

Tracking spillover events in real-time:

Moving forward, genetically-informed outbreak response requires:

  • Rapid sharing of sequence data, genetic context critical
  • Technologies to rapidly conduct phylogenetic inference
  • Technologies to explore genetic relationships and inform epidemiological investigation


Richard Neher (Max Planck Tübingen), Andrew Rambaut (University of Edinburgh), Colin Russell (Cambridge University), Michael Lässig (University of Cologne), Marta Łuksza (Institute for Advanced Study), Gytis Dudas (University of Edinburgh), Pardis Sabeti (Harvard University), Danny Park (Harvard University), Nick Loman (University of Birmingham) Matthew Cotten (Sanger Institute), Paul Kellam (Sanger Institute), WHO Global Influenza Surveillance Network



  • Website:
  • Twitter: @trvrb
  • Slides: