You know how when you're traveling you pick up post cards with every intention of writing them and sending them from your exotic location only to mail them out once you're home? This post is kind of like that.

I'm writing this from my desk in Seattle, but I was in fact away... in Brazil! I had the incredible opportunity to spend two weeks sequencing Zika genomes from clinical samples with Josh Quick (of Ebola sequencing fame) and Sarah Hill from Oliver Pybus' group. You might be familiar with the ZiBRA sequencing road trip that Trevor was a part of back in June. This was a follow-up trip with the goal of generating a lot more genomes with a newly optimized protocol.

Plans for this trip were hatched about three weeks before we all flew down, in a pub in Cornwall where I was for PoreCamp 2016. The vast majority of the logistical details were hashed out over Whatsapp. This type of planning for a pretty major trip seemed kind of crazy and unlikely to work, and yet it totally did. And really, kind of crazy but managing to work is a pretty good description of the trip in its entirety.

The good news first! Josh’s freshly minted protocol worked very well and we were able to sequence 20 genomes on the MinION with over 75% genome coverage (and some with as high as 98% genome coverage). 20 genomes might not sound like many but there were only around 90 genomes publicly available when we flew down, so proportionally this was a lot of information to get out of the trip.

Behind the scenes though we were riding a roller coaster of success and issues, and given that our experiences are probably pretty typical of outbreak sequencing in the field, they warrant some description. We started off in Salvador, where all the RNA extracted during the road trip had been stored. The first major issue we had to deal with was managing contamination. With 45 cycles of PCR needed to amplify sufficient Zika cDNA for sequencing, we were pretty concerned about keeping amplicons away from any RNA yet to be reverse transcribed. This pre- and post-pcr separation was a little more challenging in our Salvador lab, which had a single thermal cycler (we have a two-step PCR protocol) and no separated lab spaces. We ended up turning a biosafety cabinet into our pre-pcr area, complete with a tiny 8 tube thermal cycler that we ran off of a battery pack and programmed using Josh's computer. Given that we could UV the whole setup after every run this actually worked extraordinarily well and despite so many rounds of amplification we had clean negative controls.

As we processed more and more samples we realized that there didn't seem to be as sharp of a relationship between Ct and sequence-ability as we were expecting. Josh figured that RNA degradation was likely to blame for this, as the samples had probably experienced some freeze-thaw during their transportation along the coast of Brazil. There wasn't much that could be done about the degradation at this point, but it did mean that we tried to work as much as possible from fresh extractions in São Paulo.

Another hurdle was ensuring that we had complete epidemiological data for the samples we sequenced. Trevor described some of the challenges with wrangling metadata on the road trip, but in the end all that work meant that the metadata was pretty complete for the samples in Salvador. This was more of a problem in São Paulo, where samples had been received by multiple people, in multiple labs. There really isn't much to say here except you'll probably need to be pretty dogged in your pursuit and you'll probably feel like a bit of a nag. However, the inferential worth of the sequences drops a lot when the associated epidemiological information isn't available, so it really is worth the sweat and tears to hunt it down.

So tl;dr. Thinking of doing some outbreak field sequencing? Awesome, if you can get a good team together it will be a lot of fun (dare I say a sequencing vacation?). Be prepared to be resourceful and creative, to constantly troubleshoot, and to cross your fingers a lot. Try and have fresh RNA, battle to make the lab as clean as you can, and work hard to verify and manage your metadata.

Richard Neher and I have compiled another report on recent patterns of seasonal influenza virus evolution with an eye toward projecting forward to 2016 and 2017 flu seasons. All analyses are best on the nextflu platform. Doing weekly updates on nextflu has forced us to keep pipelines current and has made putting together these reports not such a chore.

This time around, the biggest news is within H3N2, where we're seeing the rapid spread of a subclade within 3c2.a viruses. This subclade is primarily distinguished by the HA1:171K mutation (along with changes HA2:77V/155E). We predict these viruses will predominate in the future H3N2 population. However, we lack antigenic data to really say whether the vaccine needs updating. It's possible to have this sort genetic evolution without strong antigenic evolution necessitating a vaccine update.

In putting together the report this time, it was helpful referring to past reports from last September and this February. Gratifyingly, in February we stated:

Barring substantial changes in other clades, we predict the (HA1:171K, HA2:77V/155E) variant to dominate.

This is exactly what's come to pass in the last 6 months. As we keep doing this, we'll be able to compile hits-and-misses and see where the intuition and models are succeeding and where they are failing.

22 Aug 2016 by trvrb

This has been a busy, but fun and productive, summer. Lots of things going on. I had a couple conferences, traveled to Brazil in June to help with Zika sequencing and traveled to South Korea for a collaborative visit. In addition to lab things, I've been working with Charlton Callender, Richard Neher and Colin Megill on the nextstrain project, trying to get all the pieces of the pipeline together. We're basically doing a full refactor from the existing nextflu codebase to include a database to manage sequence and serological data, improved build pipelines and more flexible visualization tools. I'll try to write more on the nextstrain project at a later date. We're trying to have a prototype ready by Dec 1 when Open Science Prize judging will be held.

Lots of activity in the lab:

I'm looking forward to the coming year. We have lots of momentum at this point and it will be fun to see the science that's produced.

I'm at the Rio airport now, heading home after 9 days in Brazil as part of the ground team of the ZiBRA project. As part of the team, I traveled from Natal to Recife along the northeastern coast collecting clinical samples for mobile Zika genome sequencing and analysis. This has been an illuminating experience and I'm grateful to Nick, Nuno, Luiz and the rest of the team for inviting me to be part of this.

I truly believe that pathogen genome analysis can contribute significantly to epidemiological understanding and outbreak response. However, for this to work, genomes need to be produced and shared quickly enough so that epidemiological insights are actionable. This was a major issue for much of the West African Ebola outbreak, limiting the utility of genomic approaches. The situation is somewhat better for the ongoing Zika epidemic in the Americas in that multiple groups are releasing a genome here and a genome there, but overall depth is still lacking with just 64 outbreak genomes available at this time. The ZiBRA project is an attempt to do real-time genomic surveillance of Zika in Brazil. If all goes according to plan, this project will rapidly provide a dataset for downstream analysis of Zika evolution and epidemiology, aiding understanding of virus spread and epidemic dynamics.

The trip was incredibly eye-opening for me in terms of the messy reality of viral surveillance and the even-more-messy details of Zika surveillance in Brazil. The basic pipeline for Zika surveillance in Brazil by the Ministry of Health (much like other viral surveillance systems) goes something like:

  1. Patient presents at a clinic with symptoms consistent with infection (fever, rash, etc...).
  2. The clinician sends a blood sample to the regional diagnostic laboratory (these are referred to as LACENs).
  3. The LACEN extracts viral RNA and runs RT-PCR to confirm viral presence in the sample.

The RT diagnostic is particularly important as clinical symptoms are difficult to distinguish between Zika, dengue and Chikungunya. With the road trip, we were able to bring in reagents and expertise lacked by the LACENs and burn through a large number of banked clinical specimens to search for additional RT-positives. In some cases, we were able to confirm Zika diagnoses of pregnant women who presented the week before. We reported postive and negative RT diagnostics back to the LACENs. RT-positive samples were then brought forward for PCR amplification and MinION sequencing.

I did help a bit with the lab-work, but I ended up mostly running point on metadata. As might be expected given the circumstances, the lab work was incredibly chaotic and I spent most of my time trying to keep sample data from unraveling. To keep epi metadata attached to a sample required maintaining a linkage between the numbers written from tube-to-tube-to-tube and the original LACEN ID. It also required digging through the LACEN diagnostic reports to pull in important epi metadata like date of collection and municipality of residence. I've never quite appreciated before the degree to which data wants to come apart if continual attention is not paid (proper data is an ordered state that is constantly under attack by entropic forces). I hope I've left the team with systems in place to promote further metadata collection.

We finished base calling and assembly on the first MinION runs on June 8, but realized they need resequencing to have good coverage. That said, we should be releasing genomes soon and hope to keep a flow of genomes going through the next few months. I'm super excited to be able to rapidly incorporate these genomes into nextstrain.org and help with tracking Zika evolution and epidemic spread.

We're looking for a programmer to help with the current push towards real-time analysis of virus evolution. The advertisement follows:

A programmer position is available immediately in the Bedford lab at the Fred Hutch to develop inference algorithms and interactive visualizations of viral outbreaks using DNA/RNA sequence data.

There has been remarkable progress in phylodynamic methods which use viral genetic sequence data to infer patterns of epidemic growth and geographic spread as well as patterns of adaptive evolution and strain turnover. However, until recently, these methods were solely applied in retrospective analyses to understand past events. With increasing availability and timeliness of sequence data, it is now becoming possible to perform and share phylodynamic analyses in near real-time.

Our group is leading this transition to real-time analysis and prediction. Current efforts by the lab include analyses of influenza (nextflu.org), Ebola (ebola.nextstrain.org) and Zika (nextstrain.org/zika/) viruses. These sites are already being used by the Centers for Disease Control and the World Health Organization, particularly for influenza vaccine strain selection, and we believe further development will lead to substantial public health benefit. This project has grown to the point where we need a full-time programmer to generalize, refactor, and maintain our platform for data ingestion, processing and visualization.

The ideal candidate would have experience in Python (informatic processing pipeline augur is written in Python) and Javascript (browser-based visualization auspice is written in Javascript). Experience with databases (we are using rethinkdb for sequence organization) or web application programming would be a plus. The new team member would contribute to refactor the current codebase to make a more general and extensible platform for informatic processing and also to speed up the auspice visualization by reactive JSON pull-downs. Additional development would focus on extending pipelines to other viruses / pathogens and building new features for current viruses, as well as, implementing a robust Docker-based deployment pipeline.

The Fred Hutch is located in South Lake Union in Seattle, WA and offers a dynamic work environment with cutting-edge science and computational resources. The position is available immediately with flexible starting dates. Informal inquires are welcome. Applications will be accepted until the position is filled. We offer a competitive salary commensurate with skills and experience, along with benefits. The Fred Hutch and the Bedford lab are committed to improving diversity in the computational sciences. Applicants of diverse backgrounds are particularly encouraged to apply.

For more information about the lab, please see the our website at bedford.io. To apply for the position please send (1) current resume, (2) code samples or links to published/distributed code and (3) contact information for two references to trevorobfuscate@bedford.io.

Richard Neher and I have compiled a report on recent patterns of seasonal influenza virus evolution with projections for future clade behavior based on the nextflu platform. This is designed to be looking forward to the 2016-2017 flu season. Lots of interesting behavior in H3N2 and H1N1pdm viruses. Within H3N2, the clade 3c2.a has continued to predominate throughout the last year and now genetic diversity is accumulating within 3c2.a. The issue is now watching for new variants within 3c2.a that may be antigenically distinct and thus spread through the virus population. H1N1pdm viruses have seen the rapid rise a novel clade denoted 6b.2 during 2015. This clade is demarcated by the substitutions 84N/162N/216T. The joint substitutions 162N/216T have risen especially rapidly with this season's Northern Hemisphere epidemic comprised of almost entirely 6b.2 viruses. However, detailed antigenic characterization is necessary to determine whether the seasonal influenza vaccine warrants updating.

In collaboration with Dan Neafsey at the Broad Institute, Dyann Wirth at Harvard, Peter Gilbert and Michal Juraska at the Fred Hutch and a large team of researchers, we've just published a paper in the New England Journal of Medicine showing strain-specific vaccine efficacy in a recent malaria vaccine trial. We found that the RTS,S malaria vaccine worked better against strains that were genetically matched to the vaccine antigen compared to unmatched strains. The right hand figure shows that across study sites, parasites matched to the vaccine strain (3D7) are overrepresented among the control group, indicating that the vaccine better protected against infection by 3D7 than against other strains. Proper statistical analysis shows that there was a 1-year vaccine efficacy of 50% against strains perfectly matched to the vaccine antigen and a 1-year vaccine efficacy of 33% against unmatched strains. The vaccine strain 3D7 is at low (~10%) frequency in Sub-Saharan Africa. This suggests that vaccine efficacy could be straight-forwardly improved by just swapping 3D7 for a more common haplotype.

It's interesting to see strain-specific vaccine for malaria, suggesting that like flu, a good vaccine requires matching to the circulating pathogen strains. This was a fun study to participate in. Hopefully, we could see further improvements to the RTS,S vaccine. Even at current efficacy levels, RTS,S is estimated to have a cost effectiveness of ~$150 per disability-adjusted life year saved.

Richard Neher and Boris Shraiman, along with myself, Colin Russell and Rod Daniels have just completed a new analysis of antigenic drift in the seasonal influenza virus. We've put a manuscript on "Prediction, dynamics, and visualization of antigenic phenotypes of seasonal influenza viruses" up on the arXiv and have created an interactive visualization of our model at HI.nextflu.org.

Much previous work on this topic, including my own, has focused on "antigenic cartography" in which antigenic phenotype is embedded into a 2D antigenic map where distances between viruses are proportional to drop in titer in the hemagglutination inhibition (HI) assay. Here, we take a complementary approach where drop in titer is directly mapped to the influenza phylogeny; specific branches are assigned drops in titer. Thus, rather than embedding into 2D Euclidean space, this model assumes embedding into tree space. As a model, it works quite well, allowing prediction of unmeasured titers at high accuracy. In the manuscript, we investigate the correlation between antigenic advancement and clade success, finding that more advanced clades tend to win in the global competition among viruses. However, we observe significant noise when the new clade is at very low frequency.

A major feature of the cartography-based approach is providing a single pictorial view of the antigenic relationships among viruses. This is enabled by the 2D basis for cartography. The tree-based model that we present here does not permit a single viewpoint. Instead, we provide an interactive visualization of the model, in which users can click on a particular reference virus in the phylogeny and see how other viruses relate to this focal virus. We allow coloring of the phylogeny by raw HI titer, as well as by model expectations, allowing exploration of why the model behaves as it does. We hope this approach will be useful in investigation of antigenic relationships among circulating influenza viruses.