Rethink database to support real-time virus analysis

Flu Pipeline Notes


Upload documents to VDB

  1. Download sequences and meta information from GISAID
    • In EPIFLU, select host as human, select HA as required segment, select Submission Date >= last upload date to vdb
    • Ideally download about 5000 isolates at a time, may have to split downloads by submission date
    • Download Isolates as XLS with YYYY-MM-DD date format
    • Download Isolates as "Sequences (DNA) as FASTA"
    • Select all DNA
    • Fasta Header as 0: DNA Accession no., 1: Isolate name, 2: Isolate ID, 3: Segment, 4: Passage details/history, 5: Submitting lab
  2. Move files to nextstrain-db/data as gisaid_epiflu.xls and gisaid_epiflu.fasta.
  3. Upload to vdb database

Update documents in VDB

  • Update genetic grouping fields
    • python vdb/ -db vdb -v flu --update_groupings
    • updates vtype, subtype, lineage

Download documents from VDB

  • python vdb/ -db vdb -v flu --select locus:HA lineage:seasonal_h3n2 --fstem h3n2
  • python vdb/ -db vdb -v flu --select locus:HA lineage:seasonal_h1n1pdm --fstem h1n1pdm
  • python vdb/ -db vdb -v flu --select locus:HA lineage:seasonal_vic --fstem vic
  • python vdb/ -db vdb -v flu --select locus:HA lineage:seasonal_yam --fstem yam


Upload documents to TDB

Raw tables from NIMR reports

  1. Convert NIMR report pdfs to csv files
  2. Move csv files to subtype directory in nextstrain-db/data/
  3. Upload to tdb database

Flat files

  1. Move line-list tsv files to nextstrain-db/data/
  2. Upload to tdb database with python tdb/ -db tdb -v flu --subtype h3n2 --ftype flat --fstem H3N2_HI_titers_upload

Download documents from TDB

  • python tdb/ -db tdb -v flu --ftype augur --subtype h3n2
  • python tdb/ -db tdb -v flu --ftype augur --subtype h1n1pdm
  • python tdb/ -db tdb -v flu --ftype augur --subtype vic
  • python tdb/ -db tdb -v flu --ftype augur --subtype yam