Dengue antigenic relationships predict evolutionary dynamics

About the fitness model

In meaningfully antigenically diverse viral populations, antigenic novelty (relative to standing population immunity) contributes to viral fitness: as a given virus i circulates in a population, the proportion of the population that is susceptible to infection with i--and other viruses antigenically similar to i--decreases over time as more people acquire immunity. Antigenically novel viruses that are able to escape this population immunity are better able to infect hosts and sustain transmission chains, making them fitter than the previously circulating viruses. Thus, if antigenic novelty constitutes a fitness advantage for DENV, then we would expect greater antigenic distance from recently circulating viruses to correlate with higher growth rates. We use a simple model, adapted from Luksza and Lassig (Nature, 2014) and illustrated below, to estimate population immunity and viral fitness. We then use viral fitness to predict clade frequencies and growth rates.

Fitness model overview A The relative frequency of each serotype in Southeast Asia is estimated every three months based on available sequence data.
B We calculate antigenic fitness for each serotype over time as its frequency-weighted antigenic distance from recently circulating viruses.
C illustrates how the model predicts clade growth rates. At each timepoint t, we blind the model to all empirical data from timepoints later than t and predict each serotype's future trajectory based on its initial frequency, time-invariant intrinsic fitness, and antigenic fitness at time t. We predict forward in three-month increments for a total prediction period of dt = 5 years. At each increment, we use the predicted stepwise frequency change to adjust our estimates of antigenic fitness on a rolling basis. Predicted growth rates are calculated as the predicted final frequency over the actual initial frequency. These predicted growth rates are compared to empirically observed growth rates to assess model performance. The example illustrated in C is also shown in D as the blue point.

Running the fitness model

Prepare input files

1 - Estimate empirical clade frequencies over time by running augur.
2 - Parse clade frequencies to identify which clades correspond to serotypes and genotypes of interest.
3 - Parse the titer tree (also output from augur) to estimate the antigenic distance between each pair of clades (sum values of dTiter for each branch that lies between the two clades on the tree).
N.B.: For this dataset, steps 1-3 have already been completed; prepared input files for the fitness model can be found here

Fit model parameters

4 - Explore parameter space
(Assuming you are on a standard scientific computing cluster using the slurm queue manager) helpers_scripts$ python clade_resolution antigenic_resolution
clade_resolution should be either serotype or genotype.
antigenic_resolution should be either interserotype_model or fulltree_model

5 - Examine model performance
Load the out file, model_performance.csv into the visualization notebook here to visualize model performance as defined by a variety of optimization metrics.

N.B.: For this dataset, steps 4-5 have already been completed; model performance for the entire parameter space can be found here. Optimized parameter values are shown below:

beta Slope of linear relationship between population immunity and viral fitness
gamma Slope of linear relationship between titers and probability of protection
sigma Proportion of titers waning each year since primary infection
f_{s0} Relative intrinsic fitness of each serotype (f_0 = 0 for DENV4)
N Number of years of previous immunity that contribute to antigenic fitness
dt Number of years in the future to predict clade frequencies

Genetic resolutionAntigenic resolutionMetricMetric valuebetagammasigmaDENV1 f_0DENV2 f_0DENV3 f_0DENV4 f_0
SerotypeInterserotypeDelta SSE15.022.570.570.864.573.432.140.00
SerotypeInterserotypePearson R20.632.570.570.863.432.290.710.00
GenotypeInterserotypeDelta SSE14.832.570.570.865.714.573.570.00
GenotypeInterserotypePearson R20.362.570.570.865.715.712.860.00
GenotypeFull treeDelta SSE14.221.710.570.431.400.800.400.00
GenotypeFull treePearson R20.331.290.570.431.401.600.400.00

Run the fitness model

6 - Run -h to see all available options. Simply running with the default options will run the serotype model with parameters fit to optimize delta_sse.

7 - Explore model output and visualize results with this notebook