RethinkDB database to support real-time virus analysis

ZIKA Pipeline Notes


  1. Make sure environment variables for connecting to fauna are set.

Upload via ViPR and update citations

ViPR sequences

  1. Download sequences
    • Select year >= 2013 and genome length >= 5000
    • Download as Genome Fasta
    • Set Custom Format Fields to 0: GenBank Accession, 1: Strain Name, 2: Segment, 3: Date, 4: Host, 5: Country, 6: Subtype, 7: Virus Species
    • May also use the ViPR API
  curl ",strainname,segment,date,host,country,genotype,species&output=fasta" |\
  tr '-' '_' |\
  tr ' ' '_' |\
  sed 's:N/A:NA:g' >\

The search-and-replace commands (tr, sed) are necessary because the API downloads fasta headers similar to:

>KY241742|ZIKV_SG_072|N/A|2016-08-28|Human|Singapore|Asian|Zika virus

but need to match the GUI downloaded headers similar to:


  1. Move downloaded sequences to fauna/data
  2. Extract GenomicFastaResults.tar.gz and rename the extracted file to GenomicFastaResults.fasta
  3. Upload to vdb database
    • python3 vdb/ -db vdb -v zika --source genbank --locus genome --fname GenomicFastaResults.fasta


  • Update citation fields
    • python3 vdb/ -db vdb -v zika --update_citations
    • updates authors, title, url, journal and puburl fields from genbank files
    • If you get ERROR: Couldn't connect with entrez, please run again just run command again

Download from Fauna, parse, compress and push to S3

Download from Fauna

python3 vdb/ \
  --database vdb \
  --virus zika \
  --fasta_fields strain virus accession collection_date region country division location source locus authors url title journal puburl \
  --resolve_method choose_genbank \
  --fstem zika

This results in the file data/zika.fasta with FASTA header ordered as above.


augur parse \
  --sequences data/zika.fasta \
  --output-sequences data/sequences.fasta \
  --output-metadata data/metadata.tsv \
  --fields strain virus accession date region country division city db segment authors url title journal paper_url \
  --prettify-fields region country division city

This results in the files data/sequences.fasta and data/metadata.tsv.


zstd -T0 data/sequences.fasta
zstd -T0 data/metadata.tsv

This results in the files data/sequences.fasta.zst and data/metadata.tsv.zst.

Push to S3

nextstrain remote upload s3://nextstrain-data/files/zika/ data/sequences.fasta.zst data/metadata.tsv.zst

This pushes files to S3 to be made available at and

Run zika workflow

See instructions at