Cryptic transmission of SARS-CoV-2 in Washington State

transPhyloTools is a set of tools for transmission tree, phylogenetic, and evolutionary simulation

I’m prototyping tools in Matlab to explore ideas and model structures. While all tools are usable in principle for research, no effort has gone into performance and some patterns are anti-optimized for matlab’s memory management. Promising tools will be reimplemented in Python/R/C++/CUDA as appropriate.

treeClusterSim

treeClusterSim is a prototype of the next-generation of transmission models that take the “pathogens-eye view” of transmission. The key ideas are that:

  1. the transmission tree is fundamental (whether we care to observe it or not).
  2. there are 2 scales of transmission– within-population (deme/node) mixing according to a model, and between population that can be spontaneously spawned.
  3. nodes in the tree are infections, not individuals, regions etc. In simulation, the only metadata that has meaning is time.
  4. All information about hidden states are statistical questions or post-hoc mapping questions, and not dynamical questions. The goal of the prototype is to develop these ideas and see where they break down.

augmentTransTree

augmentTransTree is a tool to add attributes to transmission trees that share the linelist output format from treeClusterSim. The idea of this tool is that any attribute or “state” information that doesn’t affect the pathogens-eye view of transmsision can be added after the transmission tree is generated.

For exaple, treeClusterSim assumes that the pathen experiences SIR-like pockets within a larger world, and augmentTransTree can post-hoc annotate attributes of the sampled pathogens (or their hosts) in those pockets.

  • pathogen attributes (not yet implemented)
    1. sequences assuming neutral evolution
  • host attributes
    1. age, sex, vaccination status, sampling modality…
    2. geographic location (with a spatial connectivity model)

observeTransTree

observeTransTree provides tools to implement a complex sampling strategy based on metadata generated by augmentTransTree.m.

phyTreeFromTransTree

phyTreeFromTransTree downsamples a complete transmission tree (provided as an edgelist of who infected whom and when and a set of observed tips) into a bifurcating phylogenetic tree. The main function is samplePhyloTree.m, and you can step through a demo with buildExample.m.

Typical workflow

  1. Generate complete linelist of infections with metadata and known ancestors from treeClusterSim or your favorite model.
    • If you use your own model, valid linelist output for downstream processing must look like this test example. The three required columns are id, infectedById, and timeInfected. Other columns can include additional metadata, individual traits relevant to transmission, etc, and these may may be useful for augmentTransTree and observeTransTree but are not required.
    • Required column specification
      • id must contain a list of unique identifiers for every infection in the simulation run. The datatype can be either character or numeric–it is only essential that the IDs are unique, otherwise downstream processing into trees will break.
      • infectedById is the unique identifier of the parent of each infection in id. For id’s corresponding to individuals at the start of the simulation, where no infectedById exists, assign a “ROOT” label to infectedById. You can further structure those roots however you want (see treeClusterSim examples for ones with multiple independent roots) provided the string “ROOT” appears somewhere in the infectedById.
      • timeInfected must be the numeric time that id was infected. In downstream tree processing, we will assume that infections are sampled at the time of transmission. (It won’t be hard to add sampling time as distinct from infection time in future version of this software, but that isn’t an option right now.)
  2. (OPTIONAL) If you want to add some metadata that has no affect on transmission, configure and run augmentTransTree. See augmentTransTree/buildExample.m for more.
  3. (OPTIONAL) To downsample the complete linelist with an arbitrarily complex sampling frame given the metadata,see observeTransTree/buildExample.m.
  4. To generate a phylogenetic tree from a sample of the complete linelist, use the tools in phyTreeFromTransTree. See phyTreeFromTransTree/buildExample.m for workflow.
    • generate the full, true transmission tree from the full linelist. This transmission tree is a directed graph that contains all infections as nodes.
    • generate the downsampled true transmission tree from the full tree given the list of sampled infections. This directed graph only contains sampled infections and ancestors of the sample.
    • transform the downsampled transmission tree into a time-scaled phylogenetic tree. This transforms a transmission tree in which sampled nodes can be internal to one where all sampled nodes are tips and internal nodes are only the N-1 most recent common ancestors of the sample.

Complete examples of the workflow applied to respiratory illness are available in the parallel repo transGenEpi/seattleFluSimulatedData.