# About the fitness model

In meaningfully antigenically diverse viral populations, antigenic novelty (relative to standing population immunity) contributes to viral fitness: as a given virus *i* circulates in a population, the proportion of the population that is susceptible to infection with *i*--and other viruses antigenically similar to *i*--decreases over time as more people acquire immunity. Antigenically novel viruses that are able to escape this population immunity are better able to infect hosts and sustain transmission chains, making them fitter than the previously circulating viruses. Thus, if antigenic novelty constitutes a fitness advantage for DENV, then we would expect greater antigenic distance from recently circulating viruses to correlate with higher growth rates. We use a simple model, adapted from Luksza and Lassig (*Nature*, 2014) and illustrated below, to estimate population immunity and viral fitness. We then use viral fitness to predict clade frequencies and growth rates.

# Running the fitness model

### Prepare input files

*1 - Estimate empirical clade frequencies over time by running augur.*

*2 - Parse clade frequencies to identify which clades correspond to serotypes and genotypes of interest.*

*3 - Parse the titer model output (also output from augur) to estimate the antigenic distance between each pair of clades.*

**N.B.: For this dataset, steps 1-3 have already been completed; prepared input files for the fitness model can be found here**

### Fit model parameters

*4 - Explore parameter space*

First, set up your parameter grid for beta, gamma and sigma here.

Then, set up your parameter grid for DENV1_f0, DENV2_f0 and DENV3_f0 here.

Once your grid is configured, run each set in parallel on slurm.

Assuming you are on a standard scientific computing cluster using the slurm queue manager:

`module load python2`

# for the Hutch cluster, loads all required packages

`helper_scripts$ python sbatch_param_wrapper.py clade_resolution antigenic_resolution`

`clade_resolution`

should be either `serotype`

or `genotype`

.

`antigenic_resolution`

should be either `interserotype`

or `fulltree`

# if genetic resolution == 'serotype' then these are equivalent

_This will launch all of the jobs for you and write results to a bunch of csv files in `./southeast_asia/genetic_resolution/antigenic_resolution/`

.

Once they are done (for serotypes, ~2-3 hours), simply run:

`output_directory$ python ../../../helper_scripts/collate_jobs.py`

This will generate a collated output file, `model_performance.csv`

I typically start with a first pass with a coarse grid: `np.linspace(0,3,7)`

for beta, gamma and sigma; `np.linspace(0,5,7)`

for DENV*f0 parameters. I then inspect the output (see below) and center a fine grid around the "best" parameter values, i, found in the coarse grid with `np.linspace(i-0.45, i+0.45, 7)`

such that values are 0.15 apart.

*5 - Examine model performance*

*Load the out file, model_performance.csv into the visualization notebook here to visualize model performance as defined by a variety of optimization metrics.*

**beta** Slope of linear relationship between population immunity and viral fitness

**gamma** Slope of linear relationship between titers and probability of protection

**sigma** Proportion of titers waning each year since primary infection

**f_{s0}** Relative intrinsic fitness of each serotype (f_0 = 0 for DENV4)

**N** Number of years of previous immunity that contribute to antigenic fitness

**dt** Number of years in the future to predict clade frequencies

### Run the fitness model

6 - Run `antigenic_fitness.py -h`

to see all available options. Simply running with the default options will run the serotype model with parameters fit to optimize `delta_sse`

.

7 - Explore model output and visualize results with this notebook