Dengue antigenic relationships predict evolutionary dynamics

Titer model

About the titer model

Antigenic distances between pairs of dengue viruses are experimentally measured via neutralization titers. These titer values are prone to noise, and there is a limited amount of available titer data. If the antigenic heterogeneity observed in the raw data is truly the result of an underlying evolutionary process, we expect that changes in antigenic phenotype correspond to underlying changes in viral genotype.

The titer model maps changes in antigenic phenotype (titer drops) to specific branches in the viral phylogeny as described below. This allows us to directly quantify the extent to which observed phenotypic variation is explained by an underlying genetic evolutionary process. We can also use variations of the model formulation to directly compare competing hypotheses about the nature of dengue antigenic evolution.

The titer model pipeline has three main steps. First, we build a phylogeny of dengue virus sequences to establish the genetic relationships between viruses. Next, we infer how much antigenic change has occurred along each branch of the phylogeny by mapping titer changes to individual branches. This assigns each branch b an antigenic distance db. With this in hand, we estimate the antigenic distance between all pairs of viruses by tracing the path between them in the phylogeny, summing branch-specific distances $d_b$ as we go.

To learn these values of db, we first split our dataset into training (random 90% of measurements) and test data (the remaining 10% of values). We take the training data and fit db for each branch in the tree, subject to regularization. Parsimoniously, we expect that antigenic change is more likely to occur through larger changes on a few branches than through small changes on many branches; correspondingly, our prior expectation of values of db is exponentially distributed such that most values of db = 0. This is analogous to lasso regression to identify a few parameters with positive weights and set other parameters to 0. Additionally, some viruses have greater binding avidity, and some sera are more potent than others; these 'row' and 'column' effects, respectively, are normally distributed and are taken into account when estimating titers. The model uses convex optimization to learn the values of db that minimize the sum of squared errors (SSE) between observed and predicted titers in the training data.

Model variations

Titer model variation

A. The 'interserotype model' only allows branches that lie between serotypes to contribute to antigenic evolution. All other branches are assigned db = 0. B. The 'full tree model' allows any branch in the phylogeny to contribute to antigenic evolution (db ≥ 0). C,D. Predictive performance of each model on the test dataset (aggregated from 10-fold cross-validation).

Running the titer model

This model was originally published in Neher et al (PNAS, 2016) and implemented as part of the Nextstrain augur package. The relevant portions of the repository have been reproduced here under implementation-nextstrain-augur. Documentation, including install instructions, for the full augur pipeline can be found here.

1 - Prepare input files handles subsampling and other basic dataset config. You can edit the function make_config here with any desired changes.

cd dengue-antigenic-dynamics/titer_model/implementation-nextstrain-augur/builds/dengue/
mv ./prepared/dengue_all.json ./prepared/dengue_config.json

NB: For this dataset, step 1 has been run for you; see ./dengue_config.json

2 - Run the titer pipeline (and estimate clade frequencies) handles the actual analysis; parameter settings, etc. can be changed in the make_config function here.

Run python --titer_model full_tree for the "full tree" model.

Run python --titer_model interserotype for the "interserotype" model.

NB: For this dataset, step 2 has been run for you; see ./full-tree-model-output/ and ./interserotype-model-output/

3 - Check out your results

Results are output in JSON format, found in ./processed/. You can parse and examine results using the notebooks found here.

Additionally, the "full tree" model output is available at: