Real-time forecasting of influenza virus evolution

Trevor Bedford (@trvrb)
16 Jan 2018
CREST International Symposium on Big Data Applications
Tokyo, Japan

Want to forecast the make up of the future flu population from the population that exists today

Population turnover (in H3N2) is extremely rapid

Clades emerge, die out and take over

Clades show rapid turnover

Dynamics driven by antigenic drift

Drift variants emerge and rapidly take over in the virus population

Drift necessitates vaccine updates

H3N2 vaccine updates occur every ~2 years

Timely surveillance and rapid analysis essential to vaccine strain selection


Project to provide a real-time view of the evolving influenza population


Project to provide a real-time view of the evolving influenza population

All in collaboration with Richard Neher

nextflu pipeline

  1. Download all recent HA sequences from GISAID
  2. Filter to remove outliers
  3. Subsample across time and space
  4. Align sequences
  5. Build tree
  6. Estimate clade frequencies
  7. Infer antigenic phenotypes
  8. Export for visualization

Up-to-date analysis publicly available at:

Antigenic analysis

Influenza hemagglutination inhibition (HI) assay

HI measures cross-reactivity across viruses

Data in the form of table of maximum inhibitory titers

Antigenic cartography compresses HI measurements into an interpretable diagram

Instead of a geometric model, we sought a phylogenetic model of HI titer data

Identify phylogeny branches associated with drops in HI titer

Model can be used to interpolate across tree and predict phenotype of untested viruses

Model is highly predictive of missing titer values

Incorporate HI data from US Centers for Disease Control and Prevention

Up-to-date analysis at:


"The future is here, it's just not evenly distributed yet"
— William Gibson

USA music industry, 2011 dollars per capita

Influenza population turnover

Vaccine strain selection timeline

Seek to explain change in clade frequencies over 1 year

Fitness models can project clade frequencies

Clade frequencies $X$ derive from the fitnesses $f$ and frequencies $x$ of constituent viruses, such that

$$\hat{X}_v(t+\Delta t) = \sum_{i:v} x_i(t) \, \mathrm{exp}(f_i \, \Delta t)$$

This captures clonal interference between competing lineages

The question of forecasting becomes: how do we accurately estimate fitnesses of circulating viruses?

Fortunately, there's lots of training data and previously successful strains have had:

  1. Amino acid changes at epitope sites
  2. Antigenic novelty based on HI
  3. Rapid phylogenetic growth

Predictor: calculate HI drop from ancestor,
drifted clades have high fitness

Predictor: project frequencies forward,
growing clades have high fitness

We predict fitness based on a simple formula

where the fitness $f$ of virus $i$ is estimated as

$$\hat{f}_i = \beta^\mathrm{HI} \, f_i^\mathrm{HI} + \beta^\mathrm{freq} \, f_i^\mathrm{freq}$$

where $f_i^\mathrm{HI}$ measures antigenic drift via HI and $f_i^\mathrm{freq}$ measures clade growth/decline

We learn coefficients and validate model based on previous 15 H3N2 seasons

Clade growth rate is well predicted (ρ = 0.66)

Growth vs decline correct in 84% of cases

Trajectories show more detailed congruence

Trajectories show more detailed congruence

When does the forecast fail?

Emerging clades are difficult to forecast: little antigenic data and little evidence of "past performance"

Models work well for clades at >10%, but less well for clades <5%

New mutations difficult

Models can project forward from circulating strains, but cannot foresee the appearance of new mutations

Intrinsically limits the timescale of forecasting to ~1 year

Model is only as good as the data

Requires rapid shipping of samples, rapid sequencing and rapid antigenic characterization

Current situation

Further improvements to predictive modeling

  1. Extend to other seasonal viruses
  2. Forecast NA evolution
  3. Integrate neutralization (FRA) assay data
  4. Model effects of egg adaptation
  5. Incorporate an explicit geographic model

Real-time analyses are actionable and may inform influenza vaccine strain selection

More generally real-time analyses may be useful for other viruses


Zika's arrival and spread in the Americas

Establishment and cryptic transmission of Zika virus in Brazil and the Americas

with Nuno Faria, Nick Loman, Oli Pybus, Luiz Alcantara, Ester Sabino, Josh Quick,
Alli Black, Ingra Morales, Julien Thézé, Marcio Nunes, Jacqueline de Jesus,
Marta Giovanetti, Moritz Kraemer, Sarah Hill and many others

Road trip through northeast Brazil to collect samples and sequence

Case reports and diagnostics suggest initiation in northeast Brazil

Phylogeny infers an origin in northeast Brazil

Important analyses, let's make them more rapid and more automated


Project to conduct real-time molecular epidemiology and evolutionary analysis of emerging epidemics

with Richard Neher, James Hadfield, Colin Megill,
Sidney Bell, Charlton Callender, Barney Potter,
and John Huddleston

Nextstrain architecture

All code open source at

Rapid on-the-ground sequencing by Ian Goodfellow, Matt Cotten and colleagues

Build out pipelines for different pathogens, improve databasing and lower bioinformatics bar


Bedford Lab: Alli Black, Sidney Bell, Gytis Dudas, John Huddleston,
Barney Potter, James Hadfield, Louise Moncla

Influenza: WHO Global Influenza Surveillance Network, Richard Neher, Colin Russell, Andrew Rambaut, Dave Wentworth, Becky Garten, Marta Łuksza, Michael Lässig

Zika: Nick Loman, Nuno Faria, Oli Pybus, Josh Quick, Kristian Andersen, Nathan Grubaugh, Jason Ladner, Gustavo Palacios, Sharon Isern, Gytis Dudas, Alli Black, Barney Potter, Esther Ellis, Louise Moncla, Diana Rojas

Nextstrain: Richard Neher, James Hadfield, Colin Megill, Sidney Bell, Charlton Callender, Barney Potter, John Huddleston, Emma Hodcroft