If you are reading this blog, you are probably already onboard with releasing data openly as it's generated. You may even have led the charge in getting other researchers and/or journals to be more open with protocols, data, and analyses. You probably don't need a reminder of why open data sharing is awesome, but sometimes it's nice to take stock and remember that what we're pushing for has real tangible value.

Louise has been investigating the genomic epidemiology of the mumps outbreak in Washington, and I've been helping out a bit too. If you want some big picture details about the project, you can read more about it here and here. As part of this project, Louise been managing Nextstrain Mumps. Recently, Patrick Stapleton from Public Health Ontario shared his sequences with us (thank you!!). Louise rebuilt Nextstrain with them, and we were totally struck by how much context the Ontario viruses provided for one of our "one-off" Washingtonian sequences.

Take a look at the image with the before and after. Before, we see that we have a couple of viruses that don't nest within the primary Washington outbreak clade. They are most closely related to Canadian viruses (Manitoba, BC), but those branches are pretty long, an indicator that there's unsampled transmission occurring. This crops up not infrequently in sparse datasets, but I still always find myself wanting to know where this transmission chain was circulating before it pops up on our radar. Importantly, this isn't just about my curiosity; knowing where importations come from, and their frequency, is important for tailoring surveillance efforts and designing or evaluating infection control measures.

Here we get lucky on two counts: 1) other people are sequencing mumps, and 2) they like sharing data! With the Ontario viruses included in the tree, we see that Washington.USA/2017321 is a very clear introduction of mumps from Ontario into Washington. Given the high genetic similarity between this Washington strain and the viruses from Ontario, it seems pretty likely that this was a direct introduction or perhaps a very recent introduction followed by a short transmission chain.

This may seem trivial, but you can play through some different scenarios to show that it's not. Before, we don't really have any idea what is going on with Washington.USA/2017321. We might ask, does this strain represent a tiny chunk of a lot of transmission that is going unobserved? If so, do we need to ramp up surveillance in a particular population? With just a bit of context we realize that no, we don't need to pour a whole bunch of resources into figuring out what's going on here. We have a travel-associated introduction, and while it's good to follow up on close contacts, we probably don't need to take significant resources away from another cluster to look into this one.

This is such a clear example of how much more value we can get out of genomic surveillance when we pool our data. Other people's sequences provide context for our own. With this project, we've been incredibly fortunate to have lots of people sharing sequences with us. Many thanks to Jenn Gardy and Jeff Joy in British Columbia, Shirlee Wohl in Massachusetts, and Patrick Stapleton in Ontario. And of course thank you to all the authors who have put sequences up openly on GenBank. The mumps phylogeny would look ridiculous without you.

Positions for a full-stack developer and bioinformatician are available immediately in the Bedford lab at the Fred Hutch to work on an open-source genomic epidemiology research platform enabling a large-scale study of respiratory illness in Seattle.

In collaboration with groups at the Fred Hutch, the University of Washington, Seattle Children's, the Institute for Disease Modeling, and area hospitals, we're embarking on a high-resolution, multi-year study of influenza and other respiratory illnesses in Seattle. Through the study thousands of influenza and other respiratory pathogens will be sampled and sequenced in near-real time and from these viral genome sequences, transmission dynamics uncovered. The primary task of both positions will be to develop the information and analysis systems underlying the study's research aims, with potential expansion to new studies around the world in the future. This new platform will build upon the software behind Nextstrain, an award-winning tool for tracking infectious disease epidemics developed in collaboration with the Neher lab at the University of Basel. Nextstrain won the Open Science Prize in Feb 2017 and has already been instrumental in analysis of Ebola spread in West Africa, Zika spread in the Americas and is used by the World Health Organization to aid in the process of influenza vaccine strain selection.

The envisioned software platform will ingest subject and sample metadata, lab assay results, and raw and processed genome data as well as provide access to this curated data to streamline downstream analysis. This platform will be open-source, adaptable, and designed with future outbreak surveillance in mind, regardless of if the targeted pathogen is viral, bacterial, or eukaryotic in nature. On top of this data warehousing platform, we will deploy analytic pipelines that align sequences, build phylogenies and reconstruct city-scale transmission chains.

The ideal candidate for the full-stack developer position would have expertise in web development, relational database systems, cloud infrastructure, and software engineering best practices. Experience working with genomic data or in systems integration is a plus but not a requirement.

The ideal candidate for the bioinformatician position would have expertise in genomics, molecular biology, pipeline automation, and software development practices. Experience working with cloud infrastructure and web technology is a plus but not a requirement.

Candidates for both positions should be fluent in at least one high-level programming language, such as Python, JavaScript, Ruby, or Perl. Candidates should also have excellent written and verbal communication skills. Interfacing with project collaborators in-person and online is a key aspect of both positions. Both positions will work within a small team of existing members of the Bedford lab. If you think you might be a great fit for this position but are concerned about meeting all qualifications, we'd like to hear from you.

The Fred Hutch is located in South Lake Union in Seattle, WA and offers a dynamic work environment with cutting-edge science and computational resources. The position is available immediately with flexible starting dates. Informal inquires are welcome. Applications will be accepted until the position is filled. We offer a competitive salary commensurate with skills and experience, along with benefits. The Fred Hutch and the Bedford lab are committed to improving diversity in the computational sciences. Applicants of diverse backgrounds are particularly encouraged to apply. Depending on the applicant, this position could be a full-time salaried employee, a part-time employee, or a contracted consultant.

For more information about the lab, please see our website at bedford.io. To apply for the position please send (a) your current resume, (b) code samples or links to published/distributed code you've written, and (c) contact information for two references to tsibley@fredhutch.org.

We have a new preprint up on bioRxiv describing the genomic epidemiology of Zika virus in Colombia!

Via a collaboration with the Instituto Nacional de Salud de Colombia and the Universidad del Rosario, we sequenced 8 new Colombian genomes directly from clinical samples, and did the first detailed genomic analysis of Zika in Colombia.

For this study we performed a phylogeographic analysis of 386 American Zika genomes, of which 46 were sampled from Colombia (38 previously available genomes and 8 from this study). We found that Colombian samples grouped into two separate clades, indicating at least two separate introductions of Zika to Colombia. Remarkably, despite evidence for multiple introductions, the majority of transmission within Colombia appears to descend from only one introduction. We infer that this large clade was introduced to Colombia around March 2015, indicating that Zika likely spread to Colombia before it was even confirmed to be circulating in Brazil. Finally, we see lots of evidence in the tree for movement of Zika from Colombia to other countries. Most of this movement is to countries sharing a border with Colombia. For instance, viruses from Panama, Venezuela, and Peru are descended from viruses in the primary Colombian clade, and the small secondary clade shows viral migration from Colombia into Ecuador.

We should add that the bulk of the lab work for this study occurred in Bogotá, Colombia. The folks there were so helpful and supportive of the work, and we are grateful for the chance to have worked with such an incredible team (not to mention the opportunity to explore Colombia!).

In the spirit of open science we've tried to make our data and analyses freely available and reproducible. If you're interested in the laboratory protocols and bioinformatic pipeline we're using for genome assembly you can find that all on github.com/blab/zika-seq. If you would like to reproduce the analysis presented in the manuscript, you can find all data files and build instructions on github.com/blab/zika-colombia. Finally, you can also interactively explore the phylogeny at nextstrain.org/community/blab/zika-colombia.

As always getting global collaborations off the ground requires the effort of many people. We would especially like to thank Juan-David Ramírez for sharing precious samples, and Diana Rojas and Betz Halloran for their work on getting the data sharing agreement with the INS in place. We are also incredibly grateful to the virology team at the INS, especially Katherine Laiton-Donato, Lissethe Pardo, Dioselina Peláez-Carvajal, and Marcela Mercado-Reyes, who worked tirelessly with us down in Bogotá to ensure this study's success. We sincerely hope that this is just the first of such collaborations!

In 2016 and 2017, there were mumps virus outbreaks reported across the United States, with 5,629 total cases reported by the CDC. Washington state experienced one of the highest incidence rates nationwide, reporting 891 confirmed cases between October 2016 and September 2017. To characterize the timing and number of mumps introductions into Washington, and to describe how the virus spread within the state, we have been collaborating with the Washington State Department of Health to sequence mumps viruses from buccal swabs collected throughout the outbreak. You can find all of our protocols for sample processing and sequencing on our mumps-seq GitHub page, and a downloadable fasta file with all consensus sequences here. We have sequenced 72 near-complete mumps genomes from Washington so far, and will probably take a break from sequencing for awhile to see what we can learn from these genomes. All of our genomes have been added to nextstrain.org/mumps, where they can be viewed in the context of other mumps viruses sequenced from 2016/2017 outbreaks in the US and Canada.

So far, preliminary analysis of our 72 genomes shows that the Washington mumps outbreak consists of at least 5 separate introductions of mumps into Washington. The majority of sequences (66/72) cluster within a single clade that we will call the primary outbreak clade. Viruses in the primary outbreak clade are nested within the diversity of viruses from the Massachusetts outbreak, and are closely related to viruses from Arkansas and Kansas, suggesting that outbreaks across the United States are related. The vast majority of viruses from British Columbia form their own distinct clade, suggesting that viruses from Washington are more related to those from Massachusetts than they are to those from British Columbia. However, there is a single British Columbia sequence nested within the primary outbreak clade, suggesting some, limited transmission between Washington and British Columbia.

The phylogeny also shows 4 additional introductions of mumps into Washington, although these introductions lead to a small number of cases. These smaller introduction events are also linked to Massachusetts and Kansas viruses, suggesting that there may be epidemiologic links among these states. Together, these data suggest that there were multiple introductions of mumps viruses into Washington state, but that most of these introductions did not lead to sustained transmission that we were able to sample. A single introduction event, occurring around March 2016 (95% CI: Feb 2016 – April 2016), likely seeded the majority of transmission during the outbreak. Our phylogenetic tree suggests that this introduction likely stemmed from Massachusetts; however, many US states that reported mumps cases have not been sampled, so we cannot rule out that mumps passed through another intermediate location.

We will be continuing to analyze these data to more firmly estimate the timing of introduction events, and to estimate transmission rates among different subsets of the Washington population. We would like to give a huge shoutout to our collaborators at the Washington State Department of Health, especially Chas DeBolt, Ailyn Perez-Osorio, Misty Lang, and Nhan Le, for all of their work characterizing the outbreak and collecting and preparing samples. We would also like to send a huge thank you to Jen Gardy, Jeff Joy, Pardis Sabeti, and Shirlee Wohl for sharing their sequence data with us on nextstrain.org. None of this work would make sense without the ability to contextualize our outbreak, so thank you all so much for making this possible!

Our newest study is now available as a preprint on bioRxiv! This work was done by Sidney Bell, Leah Katzelnick, and Trevor Bedford. You can find all the data, code, figures, etc etc. in the repository here.

The tl;dr

  • Dengue virus is a mosquito-borne, emerging pathogen common in tropical regions
  • While we used to think there were only 4 kinds of dengue that your immune system recognizes as distinct from one another, we identify at least 12 antigenically distinct kinds of dengue.
  • We also found that in Southeast Asia, population immunity and antigenic differences between dengue virus strains has a big impact on which kinds of dengue will circulate in a given season.

Background: dengue virus is an emerging pathogen, and kind of a weird virus

Dengue virus circulates widely in tropical regions of the world, and about half of the global human population is at risk of infection each year. While most infections are mild, a small subset of cases develop into very severe dengue fever, causing 10-15,000 deaths each year. Most countries in Southeast Asia and South America experience annual dengue epidemics, but it's difficult to predict year-to-year which countries will have a mild or severe dengue season. This is problematic because our best bet for preventing dengue-related fatalities is preemptively controlling the mosquito population (we don't yet have a good dengue vaccine).

One of the reasons why we can't yet vaccinate against dengue or foresee severe epidemics is because from a virologist's perspective, dengue virus is a little weird. For most viruses, you get sicker the first time you're infected ("primary infection") than the second ("secondary infection"). That's because during your primary infection, your body generates an immune response and then remembers how to fight the virus the second time around. Most dengue cases follow this pattern, where the immune response learned during your primary infection helps protect your body against a secondary infection. For some dengue cases, however, this immune memory can backfire, and can actually help the virus cause severe disease. Usually, this happens when the specific strains of dengue virus that caused your primary and secondary infections look pretty different from each other to your immune system (we'd say they have distinct "antigenic phenotypes").

Question 1: how many different kinds of dengue are there (as far as your immune system is concerned)?

So, we know that differences between antigenic phenotypes are important for determining whether your immune response is protective or harmful during your secondary infection. The tricky part, though, is that we don't know exactly which viral characteristics change its antigenic phenotype (how it looks to your immune system), or even how many distinct antigenic phenotypes there are. If we can figure this out, it could potentially help researchers design better vaccine candidates (by choosing strains to include in a vaccine that will generate a wholly protective immune response).

We have a few clues to start with. Genetically, dengue viruses are super diverse: there are four major types of dengue virus (called "serotypes"), and we know that these serotypes are antigenically distinct from one another. It's historically been assumed that all the virus strains within a given serotype are antigenically identical to one another, even though they may be pretty genetically diverse (spoiler: turns out this isn't the whole story). Three years ago, Leah Katzelnick and her coauthors did some beautiful work to experimentally measure how antigenically different a bunch of dengue strains are from one another (i.e., how well does the immune response generated against your primary infection with strain A protect you against secondary infection with strain B). Their initial study suggested that some dengue strains from the same serotype may be antigenically different from each other. In our new study, we reanalyzed this dataset (hooray for open science!) to try and understand how extensive these differences were, what caused them, and how much they mattered.

Answer 1: at least 12!

Using a phylogeny-based model, we mapped how genetic relationships between viruses correspond to the antigenic differences between them. We found that as viruses diverge genetically, they also tend to diverge antigenically. We identified at least 12 distinct antigenic phenotypes of dengue virus, suggesting that the antigenic relationships between dengue strains are more nuanced than previously believed. This also suggests that there is much more to learn about how dengue evolves and interacts with the immune system. We hope this will help other researchers' important efforts design a broadly protective dengue vaccine.

Question 2: Ok, so how much do these antigenic differences matter in the real world?

We also wanted to know whether this could help us predict which dengue virus strains would be circulating in a given season. Logically, as a virus circulates in a population, the proportion of the population that is susceptible to infection with that virus — and other antigenically similar viruses — decreases over time as more people acquire immunity. Thus, virus strains that are antigenically distinct from recently circulating strains are better able to escape this population immunity, allowing them to infect more people and circulate more broadly.

Answer 2: Population immunity has a big impact on which kinds of dengue will circulate in a given season.

We leveraged our new understanding of dengue antigenic phenotypes to estimate the proportion of the population in Southeast Asia that was immune to each kind of dengue virus over time. We found that this enabled us to predict which dengue strains would predominate in a given season with reasonable accuracy. This demonstrates that fluctuations in the dengue virus population are strongly driven by antigenic differences between viral strains. We hope that this can eventually help us predict which countries will experience a mild versus severe dengue epidemic in a given season, which would help public health officials allocate resources most effectively. However, it's important to keep in mind that predictions of viral population dynamics are kind of like your local weatherman's predictions: while we can use our understanding of underlying processes to make sound predictions, there's always some level of uncertainty (in our case, our predictions of whether a serotype will increase or decrease over the next 5 years are accurate about 80% of the time).

Thank you!

We would like to thank Richard Neher, Molly OhAinle, David Shaw, Paul Edlefsen, Michal Juraska, and all members of the Bedford Lab for useful discussion and advice.

We also thank the scientists who helped generate and analyze the original antigenic dataset — our research wouldn't be possible without your commitment to open science: Judith Fonville, Gregory D Gromowski, Jose Bustos Arriaga, Angela Green, Sarah James, Louis Lau, Magelda Montoya, Chunling Wang, Laura VanBlargan, Colin Russell, Hlaing Myat Thu, Theodore Pierson, Philippe Buchy, John Aaskov, Jorge Muñoz-Jordán, Nikos Vasilakis, Robert Gibbons, Robert Tesh, Albert Osterhaus, Ron Fouchier, Anna Durbin, Cameron Simmons, Edward Holmes, Eva Harris, Stephen Whitehead, Derek Smith.

Finally, a huge shout out to the open-source developers who build and maintain the packages we heavily rely on to get science done: numpy, pandas, scipy, biopython, scikitlearn, matplotlib, seaborn, and many others.

This year has made us appreciate how little we understand about seasonal H3N2 influenza despite extensive research efforts since its emergence in 1968. One unavoidable fact is that H3N2 evolves rapidly, accumulating mutations to its hemagglutinin surface protein (HA) that enable it to escape our acquired immunity from previous infections or vaccinations. Most efforts to understand HA evolution focus on mutations that increase viral fitness by enabling escape from the immune system. These escape mutations are often described in the context of a fitness trade-off with viral replication or transmission. However, there have been no systematic studies of how individual mutations to HA affect these core replicative functions of H3N2. In a recent collaboration with the Bloom lab, led by Juhye Lee, we investigated the functional effects of all possible single amino-acid mutations to the HA of a single, recent H3N2 strain.

Juhye performed deep mutational scanning experiments to quantify the effects of mutations to HA on viral growth in cell culture. These experiments measured the preferred amino acid composition at each position in HA, allowing us to calculate the fitness effect of mutations from one amino acid to another. To determine whether our measurements approximated the fitness of mutations in natural populations, we investigated the evolutionary fates of 1321 mutations in H3N2 strains sampled from 1968 to 2018. Specifically, we compared each mutation's maximum global frequency reached in nature to its corresponding experimental mutational effect. We found that successful mutations in nature generally had neutral or beneficial experimental mutation effects, while unsuccessful mutations had deleterious mutational effects. This correlation between experimentally-measured and natural fitness effects of H3N2 mutations disappeared when we substituted our H3N2 measurements with previous measurements for a lab-adapted H1N1 strain. Indeed, we observed a significant shift in preferred amino acid compositions between H3N2 and H1N1. It is possible that this shift reflects differences between the two viral lineages in both the folding of HA and the selective pressures constraining HA evolution.

Our results suggest that experimental measurements of mutational effects in HA can help predict the evolution of seasonal influenza within a specific lineage. While these measurements do not represent the true fitness of mutations in nature, they are an important first step toward filling a gap in our understanding of H3N2 evolution. This study also prepares us for future investigations of how mutations allow viruses to escape detection by human antibodies. The combination of deep mutational scanning measurements for viral growth and immune escape should allow us to build more accurate, experimentally-informed evolutionary models for seasonal influenza.

We're looking for a developer to ramp up our efforts with Nextstrain.org. Job advertisement follows:

A developer position is available immediately in the Bedford lab at the Fred Hutch to improve backend infrastructure of Nextstrain.org and work with public health and academic partners to streamline data sharing and real-time analysis.

In collaboration with Neher lab at the University of Basel, we've built the Nextstrain platform to conduct real-time genomic epidemiology to aid understanding of pathogen spread and improve outbreak response. Pathogen genomic data can reveal otherwise hidden connections between infections and be used to infer patterns of epidemic growth, geographic spread and adaptive evolution. However, only through open sharing of genomic data can these inferences be fully realized. Our aim with Nextstrain.org is to provide a platform for both data sharing and analysis. This platform won the Open Science Prize in Feb 2017 and has already been instrumental in analysis of Ebola spread in West Africa, Zika spread in the Americas and is used by the World Health Organization to aid in the process of influenza vaccine strain selection.

The codebase is completely open source at github.com/nextstrain. Currently, we use a data parsing / cleaning module to canonicalize data from disparate sources, a RethinkDB database to host clean data, an informatic / pipeline module to process genomic data into annotated evolutionary trees and a browser-based frontend to display interactive visualizations. All backend / compute is written in Python and all frontend is written in JavaScript. At this point, the frontend has seen more development than the backend. We are now looking to improve backend infrastructure to allow easier sharing of data from outside groups and to automatically run builds when new data appears. This developer position would be in charge of backend infrastructure, but also work directly with public health and academic partners to incorporate new datasets and make an effective platform for applied genomic epidemiology.

The ideal candidate would have expertise in Python, databasing, bioinformatics and compute infrastructure. Database knowledge is required to host genomic data and provide APIs to outside groups to push data to a shared database. Informatics and compute knowledge is necessary to deploy automatically spin up builds as new data appears. This broadly aligns with experience in backend web development. Experience with frontend web development, particularly JavaScript, React and D3 would be a plus, but not at all a requirement. The ideal candidate should also have excellent communication skills as interfacing with collaborators is a key aspect.

Primary job responsibilities include: (1) managing Nextstrain database, (2) working with collaborators to keep data flowing through Nextstrain pipeline and (3) building infrastructure to streamline (1) and (2).

The Fred Hutch is located in South Lake Union in Seattle, WA and offers a dynamic work environment with cutting-edge science and computational resources. The position is available immediately with flexible starting dates. Informal inquires are welcome. Applications will be accepted until the position is filled. We offer a competitive salary commensurate with skills and experience, along with benefits. The Fred Hutch and the Bedford lab are committed to improving diversity in the computational sciences. Applicants of diverse backgrounds are particularly encouraged to apply. Depending on the applicant, this position could be a full-time salaried employee, a part-time employee or a contracted consultant.

For more information about the lab, please see the our website at bedford.io. To apply for the position please send (1) current resume, (2) code samples or links to published/distributed code and (3) contact information for two references to trevorobfuscate@bedford.io.

In 2016 and 2017, mumps outbreaks were reported in several countries, and the CDC reported 5,629 cases within the United States. Washington state has among the highest incidence rates in the country, reporting 891 confirmed cases between October 2016 and September 2017. We are collaborating with the Washington State Department of Health to sequence mumps virus samples collected from throughout the outbreak. We will use these data to determine the number and size of distinct transmission clusters, describe where distinct transmission clusters were likely introduced from, and describe how the virus spread within the state. This analysis is greatly aided by recent mumps virus sequencing efforts by the British Columbia CDC and the Broad Institute, as pooling data provides critical context.

We recently completed sequencing the first batch of mumps virus genomes provided by the Washington State Department of Health and have released the first 27 draft genomes on nextstrain.org/mumps. Protocols for sample preparation and sequencing are available at github.com/blab/mumps-seq. The 27 sequences from Washington state were collected between December 2016 and April 2017. The vast majority of these sequences (25 out of 27) cluster together within a single large clade, which we will refer to as the primary outbreak clade. This finding indicates that the majority of transmission within Washington likely occurred due to a single introduction of mumps followed by sustained person-to-person transmission. This conclusion may change as we sequence further viruses, which may provide evidence for additional clades of circulating viruses. The primary outbreak clade is closely related to all sequenced viruses sampled from the Arkansas outbreak, and is nested within the diversity of viruses sampled during Massachusetts outbreaks in 2016. Thus, we hypothesize that mumps outbreaks within the US are likely related. We also find a single genome from Washington state that clusters outside of the primary outbreak clade, which could represent a separate introduction of Mumps virus to Washington which did not yield sustained person-to-person transmission. We note that further sequencing and more sophisticated genomic analysis is required to confidently determine the total number of introductions that occurred, and how each introduction contributed to observed transmission. Finally, the primary outbreak clade also includes a single sequence from British Columbia, providing evidence for some degree of transmission between the US and Canada.

We are aiming to sequencing ~100 clinical samples in total, and will continue sequencing and adding data to nextstrain.org/mumps in the coming months. Stay tuned!