Real-time tracking of virus evolution

Trevor Bedford (@trvrb)
3 Mar 2016
Infectious Disease Epidemiology Seminar
Harvard School of Public Health

Slides at


Phylogenies describe history

Phylogenies describe history

Haeckel 1879

Phylogenies describe history

Phylogenies reveal process

Darwin 1859


Epidemic process

Sample some individuals

Sequence and determine phylogeny

Sequence and determine phylogeny

West African Ebola phylogeny

Global influenza phylogeny

Applications of evolutionary analysis for influenza vaccine strain selection and charting outbreak spread


Previous research has focused on:

  1. Antigenic drift
  2. Geographic circulation

Influenza virus

Influenza H3N2 vaccine updates

H3N2 phylogeny showing antigenic drift

H3N2 phylogeny showing antigenic drift

Flu pandemics caused by host switch events

Influenza B does not have pandemic potential

Phylogenetic trees of different influenza lineages

Antigenic drift

with Andrew Rambaut, Marc Suchard and others

Bedford et al 2014. Integrating influenza antigenic dynamics
with molecular evolution. eLife.

Influenza hemagglutination inhibition (HI) assay

HI measures cross-reactivity across viruses

Data in the form of table of maximum inhibitory titers

Compiled HI data difficult to work with

Antigenic cartography positions viruses and sera to recapitulate titer values


Antigenic cartography positions viruses and sera to recapitulate titer values


Combine phylogeny and HI data to estimate a joint antigenic map

Drift results from selective advantage of antigenically novel lineages

Phylogenetic trees of different influenza lineages

Antigenic phenotype across lineages

Antigenic drift across lineages

First study to embed a model of the process of antigenic evolution on a phylogeny. More than just description.

Geographic circulation

with Colin Russell, Philippe Lemey and many others

Bedford et al 2015. Global circulation patterns of seasonal influenza viruses vary with rates of antigenic drift. Nature.

Seasonality in influenza

Sample H3N2 from around the world

Treating geographic state as an evolving character

Phylogeny of H3 with geographic history

Infer geographic transition matrix

Air travel predicts migration rates

Geographic location of phylogeny trunk

Region-specific ancestry

Phylogenies across subtypes / lineages

H3N2 phylogeny

H1N1 phylogeny

B/Vic phylogeny

B/Yam phylogeny

Ancestry patterns across lineages

Regional persistence patterns

How to explain these differences?

Age distribution across viruses

Air travel differences between adults and children

Epidemiological model of varying rates of antigenic drift

Results of varying antigenic drift

Interaction between virus evolution, epidemiology and human behavior drives migration rate differences

Static vs dynamic inferences and the living paper

Influenza H3N2 vaccine updates


Project to provide a real-time view of the evolving influenza population

All in collaboration with Richard Neher

nextflu pipeline

  1. Download all recent HA sequences from GISAID
  2. Filter to remove outliers
  3. Subsample across time and space
  4. Align sequences
  5. Build tree
  6. Estimate frequencies
  7. Export for visualization

Up-to-date analysis publicly available at:

Including HI data, by titer drops to phylogeny branches

Model is highly predictive of missing titer values

Broad patterns agree with cartographic analyses

Recent HI data from WHO CC London annual and interim reports

Up-to-date analysis at:


The future is here, it's just not evenly distributed yet — William Gibson

USA music industry, 2011 dollars per capita

Influenza population turnover

Vaccine strain selection timeline

Seek to explain change in clade frequencies over 1 year

Fitness models can project clade frequencies

Clade frequencies $X$ derive from the fitnesses $f$ and frequencies $x$ of constituent viruses, such that

$$\hat{X}_v(t+\Delta t) = \sum_{i:v} x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

This captures clonal interference between competing lineages

Predictive fitness models

A simple predictive model estimates the fitness $f$ of virus $i$ as

$$\hat{f}_i = \beta^\mathrm{ep} \, f_i^\mathrm{ep} + \beta^\mathrm{ne} \, f_i^\mathrm{ne}$$

where $f_i^\mathrm{ep}$ measures cross-immunity via substitutions at epitope sites and $f_i^\mathrm{ep}$ measures mutational load via substitutions at non-epitope sites

We implement a similar model based on two predictors

  1. Clade frequency change
  2. Antigenic advancement

Project frequencies forward,
growing clades have high fitness

Calculate HI drop from ancestor,
drifted clades have high fitness

Fitness model parameterization

Our predictive model estimates the fitness $f$ of virus $i$ as

$$\hat{f}_i = \beta^\mathrm{freq} \, f_i^\mathrm{freq} + \beta^\mathrm{HI} \, f_i^\mathrm{HI}$$

We learn coefficients and validate model based on previous 15 H3N2 seasons

Clade growth rate is well correlated (ρ = 0.66)

Growth vs decline correct in 84% of cases

Trajectories show more detailed congruence

Formalizes intuition about drivers of influenza dynamics

Model Ep coefficient HI coefficient Freq error Growth corr
Epitope only 2.36 -- 0.10 0.57
HI only -- 2.05 0.08 0.63
Epitope + HI -0.11 2.15 0.08 0.67

Further work on predictive modeling

  1. Integrate data predictors and data sources, e.g. plan to investigate a geographic predictor
  2. Possible to build predictive models for H1N1 and B and to forecast NA evolution

Real-time analyses are actionable and thus, may inform influenza vaccine strain selection

Outbreak analysis


Epidemic nearly contained, but resulted in >28,000 confirmed cases and >11,000 deaths

Outbreaks are independent spillovers from the animal reservoir

Person-to-person spread in the early West African outbreak

Continued spread through Dec 2014

At epidemic height, geographic spread of particular interest

Rambaut 2015

Later on, tracking transmission clusters of primary importance

Tracking epidemic spread in real-time:


Virus source in Africa, spread eastward

Virus source in Africa, spread eastward

Isolated epidemics in the South Pacific

Single arrival into the Americas in early 2014

Working on analysis of ongoing evolution:

Moving forward, genetically-informed outbreak response requires:

  • Rapid sharing of sequence data, genetic context critical
  • Technologies to rapidly conduct phylogenetic inference
  • Technologies to explore genetic relationships and inform epidemiological investigation

Future work


WHO Global Influenza Surveillance Network, Richard Neher (Max Planck Tübingen), Andrew Rambaut (University of Edinburgh), Colin Russell (Cambridge University), Philipe Lemey (KU Leuven), Marc Suchard (UCLA), Steven Riley (Imperial College), Gytis Dudas (University of Edinburgh).



  • Website:
  • Twitter: @trvrb
  • Slides: