Dengue antigenic relationships predict evolutionary dynamics

About the fitness model

In meaningfully antigenically diverse viral populations, antigenic novelty (relative to standing population immunity) contributes to viral fitness: as a given virus i circulates in a population, the proportion of the population that is susceptible to infection with i--and other viruses antigenically similar to i--decreases over time as more people acquire immunity. Antigenically novel viruses that are able to escape this population immunity are better able to infect hosts and sustain transmission chains, making them fitter than the previously circulating viruses. Thus, if antigenic novelty constitutes a fitness advantage for DENV, then we would expect greater antigenic distance from recently circulating viruses to correlate with higher growth rates. We use a simple model, adapted from Luksza and Lassig (Nature, 2014) and illustrated below, to estimate population immunity and viral fitness. We then use viral fitness to predict clade frequencies and growth rates.

Running the fitness model

Prepare input files

1 - Estimate empirical clade frequencies over time by running augur.
2 - Parse clade frequencies to identify which clades correspond to serotypes and genotypes of interest.
3 - Parse the titer model output (also output from augur) to estimate the antigenic distance between each pair of clades.
N.B.: For this dataset, steps 1-3 have already been completed; prepared input files for the fitness model can be found here

Fit model parameters

4 - Explore parameter space
First, set up your parameter grid for beta, gamma and sigma here.
Then, set up your parameter grid for DENV1_f0, DENV2_f0 and DENV3_f0 here.

Once your grid is configured, run each set in parallel on slurm.
Assuming you are on a standard scientific computing cluster using the slurm queue manager:
module load python2 # for the Hutch cluster, loads all required packages

helper_scripts$ python sbatch_param_wrapper.py clade_resolution antigenic_resolution

clade_resolution should be either serotype or genotype.
antigenic_resolution should be either interserotype or fulltree # if genetic resolution == 'serotype' then these are equivalent
_This will launch all of the jobs for you and write results to a bunch of csv files in ./southeast_asia/genetic_resolution/antigenic_resolution/.

Once they are done (for serotypes, ~2-3 hours), simply run:
output_directory$ python ../../../helper_scripts/collate_jobs.py
This will generate a collated output file, model_performance.csv

I typically start with a first pass with a coarse grid: np.linspace(0,3,7) for beta, gamma and sigma; np.linspace(0,5,7) for DENV*f0 parameters. I then inspect the output (see below) and center a fine grid around the "best" parameter values, i, found in the coarse grid with np.linspace(i-0.45, i+0.45, 7) such that values are 0.15 apart.

5 - Examine model performance
Load the out file, model_performance.csv into the visualization notebook here to visualize model performance as defined by a variety of optimization metrics.

beta Slope of linear relationship between population immunity and viral fitness
gamma Slope of linear relationship between titers and probability of protection
sigma Proportion of titers waning each year since primary infection
f_{s0} Relative intrinsic fitness of each serotype (f_0 = 0 for DENV4)
N Number of years of previous immunity that contribute to antigenic fitness
dt Number of years in the future to predict clade frequencies

Run the fitness model

6 - Run antigenic_fitness.py -h to see all available options. Simply running with the default options will run the serotype model with parameters fit to optimize delta_sse.

7 - Explore model output and visualize results with this notebook