Characterizing the informativeness of pathogen genome sequence datasets about transmission between population groups

Tran Kiem C, Perofsky AC, Lessler J, Bedford T. 2025. medRxiv: 2025.08.07.25333239.

Abstract

Pathogen genome analysis helps characterize transmission between population groups. The information carried by pathogen sequences comes from the accumulation of mutations within their genomes; thus, that the pace at which mutations accumulate should determine the granularity of transmission processes that pathogen sequences can characterize. Here, we investigate how the complex interplay between mutation, transmission, population mixing and sampling impacts the power of phylogeographic studies. First, we develop a conceptual probabilistic framework to quantify the ability of pairs of sequences in capturing migration history. This allows us to comprehensively explore the space of possible phylogeographic analyses by explicitly considering the pace at which mutations accumulate and the pace at which migration events occur. Using this framework, we identify a pathogen-intrinsic limit in the mixing scale at which their sequence data remains informative, with faster mutating pathogens enabling finer spatial characterization. Secondly, we perform a simulation study exploring a range of assumptions regarding sequencing intensity. We find that sample size further imposes a limit on the characterization of mixing processes. This work highlights inherent horizons of observability for population mixing processes that depend on the interaction between evolution, transmission, mixing and sampling. Such considerations are important for the design of phylogeographic studies.