Inferring Sequence Traits (like Drug Resistance)
Unfortunately this method currently only works with VCF-input files. It will be updated to work with Fasta-input soon!
The augur function sequence-traits
can identify any trait associated with particular nucleotide or amino-acid mutations, but it’s often used to identify drug resistance mutations (DRMs).
To tell augur which sites confer what trait (or drug resistance), you’ll need to pass a file detailing these sites.
The file should usually contain five columns: GENE, SITE, ALT, DISPLAY_NAME, and FEATURE. DISPLAY_NAME can be blank, and the GENE column can be omitted if only nucleotide locations are used.
Amino Acid Sites
For example, for drug resistance in TB, we list the gene, the AA position in the gene, the AA mutation that confers resistance (you can list a site multiple times if multiple bases give resistance), and the name of the drug this mutation gives resistance to:
GENE SITE ALT DISPLAY_NAME FEATURE
gyrB 461 N Fluoroquinolones
gyrB 499 D Fluoroquinolones
rpoB 432 E Rifampicin
rpoB 432 K Rifampicin
We can leave DISPLAY_NAME blank, as auspice will by default display the gene, site, and original and alternative base.
Nucleotide Sites
For mutations outside of protein-coding genes, we can specify their position using nucleotides, and specify how we’d like them to be named when displayed:
GENE SITE ALT DISPLAY_NAME FEATURE
nuc 1472749 A rrs: C904A Streptomycin
nuc 1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin
nuc 1673423 T fabG1: G-17T Isoniazid Ethionamide
nuc 1673425 T fabG1: C-15T Isoniazid Ethionamide
In the TB literature, these mutations are still referred to by their position within non-protein-coding genes (rrs
) or location near genes (-17 fabG1
), not their nucleotide location. We can ensure auspice displays the more useful common nomenclature by giving entries for the DISPLAY_NAME column.
If you are only using nucleotide sites, you can also omit the GENE column:
SITE ALT DISPLAY_NAME FEATURE
1472749 A rrs: C904A Streptomycin
1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin
Both Nucleotide Sites and AA Sites
You can also mix sites identified by nucleotide position and those identified by AA position:
GENE SITE ALT DISPLAY_NAME FEATURE
gyrB 461 N Fluoroquinolones
gyrB 499 D Fluoroquinolones
rpoB 432 E Rifampicin
rpoB 432 K Rifampicin
nuc 1472749 A rrs: C904A Streptomycin
nuc 1473246 G rrs: A1401G Amikacin Capreomycin Kanamycin
nuc 1673423 T fabG1: G-17T Isoniazid Ethionamide
nuc 1673425 T fabG1: C-15T Isoniazid Ethionamide
Options
sequence-traits
will return a value for each “feature” - for example, all the mutations on the tree that lead to resistance to Streptomycin. It will also generate a count either of the total number of “features” each node has (ex: the total number of drugs a sequence is resistant to), or the total number or mutations specified in the file each node has (ex: the total number of DRMs a sequence has, even if some are for the same drug).
You can specify a name for this count using the --label
argument (ex: “Drug_Resistance”). The --count
argument value specifies whether to count the number of traits (ex: drugs resistant to) (use traits
) or number of overall mutations (use mutations
).