Phylogeny describes evolutionary relationships
Phylogeny is usually a hypothesis based on characteristics of sampled taxa
Phylogeny implies a series of mutational events leading to observed tip states
Among competing
hypotheses that predict equally well, the one with the fewest assumptions should be selected.
Parsimony suggests this topology requires 3 mutations at minimum
Parsimony suggests both topologies equally tenable
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
Exercise: which topology is more likely under parsimony?
"Data" is generally a sequence alignment
Phylogeny structures site patterns
Buffalo. 2015
Tree space is vast
There are (2n-3)!! rooted trees for n taxa
- 3 taxa: 3 trees
- 5 taxa: 105 trees
- 10 taxa: 34,459,425 trees
- 20 taxa: 8.2 × 1021 trees
- 50 taxa: 2.8 × 1076 trees
- 100 taxa: 3.3 × 10184 trees
Solution space is rugged
Types of phylogenetic inference methods
- Distance-based (neighbor-joining, fast, heuristic)
- Parsimony (fast, "model-free")
- Maximum likelihood (infers model of mutation, accurate, examples: FastTree, RAxML, IQTREE)
- Bayesian (like ML, but requires prior, produces estimates of uncertainty, examples: MrBayes, BEAST)
Inference is a tree topology, branch lengths and ancestral states
Molecular clocks and dated phylogenies
Mutations tend to accumulate in a clock-like fashion
"Root-to-tip" plots show temporal signal
Allows conversion between branch length and time
Dated phylogenies provide real-world context
Inference of discrete traits
"Data" is a phylogeny and tip states
States include nucleotides, amino acids, geo locations, hosts, etc...
Model infers transition matrix and ancestral states
Rare transitions, short branches and many taxa increase confidence
Nesting patterns are informative
Zika phylogeny infers an origin in northeast Brazil
Influenza phylogeny shows repeated spread from E-SE Asia
Influenza transitions mirror air travel network
Detailed inference of SARS-CoV-2
introductions into the UK
Reservoir species and host jumps
Influenza has a reservoir in wild birds, spillovers sometimes become endemic
Each Ebola outbreak derives from a separate spillover event
MERS-CoV has frequent spillover events, but limited human-to-human transmission
Reassortment and recombination
Influenza B reassorts across segments
Reassortment splits gene constellations
Recombination in seasonal coronaviruses
Summary
- Phylogenetics uses sequence data to infer trees along with ancestral states
- Molecular clocks provide a real-world context to evolutionary events
- Phylogeny reveals behavior of discrete traits, like geographic location and host state
Active research
- How to effectively utilize large sequence datasets?
- How to combat sampling bias?
- How to deal with recombination?