Bioinformatic pipelines for mumps genome sequencing

Louise Moncla¹, Allison Black^1,2, Trevor Bedford¹

¹Department of Epidemiology, University of Washington, Seattle, WA, USA, ²Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA

Overview of bioinformatic processing of mumps sequencing reads

Adapter and quality trimming with Trimmomatic
Mapping with bowtie2
Manual inspection of mapping and consensus genome calling with Geneious
Re-mapping fastq files called consensus with bowtie2

Trimming

Trimming was performed with Trimmomatic to remove Illumina adapter sequencing and ends of reads with low quality scores. Reads were trimmed in 5 bp windows to a quality score of Q30, and trimmed reads with length < 100 bp were discarded, using the following command: java -jar Trimmomatic-0.36/trimmomatic-0.36.jar SE input.fastq output.fastq ILLUMINACLIP:Nextera_XT_adapter.fa:1:30:10 SLIDINGWINDOW:5:30 MINLEN:100

Mapping

We used a genome from the mumps outbreak in Massachusetts as a reference sequence. We performed a local mapping of our trimmed reads to that reference using bowtie2, with the following command: bowtie2 -x reference_sequence.fasta -U read1.trimmed.fastq,read2.trimmed.fastq -S output.sam --local

The mapping (bam) file was manually inspected in Geneious.

Consensus sequence calling

Consensus sequences were called in Geneious, with nucleotide sites with <20x coverage called as Ns. Consensus genomes were exported in fasta format.

Remapping

To avoid issues with mapping to improper reference sequences, we then remapped each sample’s fastq files to its own consensus sequence. These bam files were again manually inspected in Geneious, and a final consensus sequence was called. Those consensus genomes for which we acquired at least 50% full-genome coverage are available here as fasta files.