Constructing phylogenies for each portion of the lentiviral genome
To assess cross-species transmission, we first built a posterior distribution of ~2000-4000 trees for each segment of the genome (as identified by GARD). Each tip was labeled with the known host state; host state at internal nodes was inferred.
XMl files can be regenerated by running
python path/to/siv-cst/scripts/beastSetup/empiricalTrees_makexml.py path/to/siv-cst/scripts/beastSetup/xmlTemplates/empiricalTrees_template.xml
(just adjust chain length and sampling frequency in the template per Methods for each dataset) from within the
Run with BEAST v. 2.4.0 and dynamic BEAGLE scaling.
Trees represent the maximum clade credibility trees for each segment of the main dataset alignment after the discrete trait analysis (below), color coded by host state. Notably, the topologies vary widely between trees, emphasizing the extent of recombination and the variable selective pressures experienced by each region.
Using discrete trait analysis to identify ancient cross-species transmissions
In phylogenetic trees of viral sequences, cross-species transmission appears as a mismatch between the host of a virus and the host of that virus’s ancestor. Heuristically, in the trees above this appears like a change in color between the tips and the internal nodes. To identify this pattern and estimate how frequently each pair of hosts has exchanged lentiviruses, we used the posterior distribution of phylogenies generated for each segment to estimate rates of host state transition between each pair of hosts.
python path/to/siv-cst/scripts/beastSetup/rates_makexml.py path/to/siv-cst/scripts/beastSetup/xmlTemplates/rates_mastertemplate.xml
(again, just adjust the chain length and sampling for your dataset per the Methods) from within
Run with BEAST v.2.4.0 and dynamic BEAGLE scaling.
Parse results with
scripts/beastAnalysis/parse_matrix.py. You may need to add the included
beastmatrix.py to your
PYTHONPATH environment variable.
For the main dataset, we identify 14 novel cross-species transmission events with high certainty (Bayes factor >= 10, black arrows). We identify numerous other transmissions with 3 <= BF < 10 (shaded in gray). Arrow width corresponds to actual rate value averaged over posterior samples; opacity corresponds to Bayes Factor. Circle size for each tip corresponds to network centrality scores. The host mitochondrial DNA maximum likelihood phylogeny forms the outer circle. Raw text values for Bayes factors and actual rates can be seen for the main dataset here and visualized here.