Visit the workflow documentation for instructions on how to set up and run the workflow.
Releasing new workflow versions
We use semantic versioning of the ncov workflow, denoting backward incompatible changes with major versions. Prior to merging a pull request that introduces a new backward incompatible change (e.g., requirement of a new version of Augur), take the following steps to document these changes:
- Determine the new version number by incrementing the current version (e.g., “v2” from “v1”).
- As part of the pull request, document the change(s) from the pull request in
docs/src/reference/change_log.mdwith the current date and new version number.
- Merge the pull request
- Create a new GitHub release using the new version as the tag (e.g., “v2”) and release title. Leave the release description empty.
We do not release new minor versions for new features, but you should document new features in the change log as part of the corresponding pull request under a heading for the date those features are merged.
Running Core Nextstrain Builds
The “core” nextstrain builds consist of a global analysis and six regional analyses, performed independently for GISAID data and open data (currently open data is GenBank data). Stepping back, the process can be broken into three steps:
- Ingest and curation of raw data. This is performed by the ncov-ingest repo and resulting files are uploaded to S3 buckets.
- Phylogenetic builds, which start from the files produced by the previous step. This is performed by the profiles
nextstrain_profiles/nextstrain-gisaid. The resulting files are uploaded to S3 buckets by the
Manually running phylogenetic builds
To run these pipelines locally, without uploading the results:
snakemake -pf all --profile nextstrain_profiles/nextstrain-open snakemake -pf all --profile nextstrain_profiles/nextstrain-gisaid
You can replace
all with, for instance,
auspice/ncov_open_global.json to avoid building all regions.
The resulting dataset(s) can be visualised in the browser by running
auspice view --datasetDir auspice.
If you wish to upload the resulting information, you should run the
upload rule uploads the resulting files, including intermediate files, to specific S3 buckets; this rule uses the
S3_DST_BUCKET config parameter.
deploy rule uploads the dataset files such that they are accessible via nextstrain URLs (e.g. nextstrain.org/ncov/gisaid/global); this rule uses the
You may wish to overwrite these parameters for your local runs to avoid overwriting data which is already present.
For instance, here are the commands used by the trial builds action (see below):
snakemake -pf upload deploy \ --profile nextstrain_profiles/nextstrain-open \ --config \ S3_DST_BUCKET=nextstrain-staging/files/ncov/open/trial/TRIAL_NAME \ deploy_url=s3://nextstrain-staging/ \ auspice_json_prefix=ncov_open_trial_TRIAL_NAME snakemake -pf upload deploy \ --profile nextstrain_profiles/nextstrain-gisaid \ --config \ S3_DST_BUCKET=nextstrain-ncov-private/trial/TRIAL_NAME \ deploy_url=s3://nextstrain-staging/ \ auspice_json_prefix=ncov_gisaid_trial_TRIAL_NAME
Triggering routine builds
Typically, everything’s triggered from the
After updating the intermediate files, that command will run the phylogenetic
ncov pipelines (step 3, above) force-requiring the rules
Triggering trial builds
This repository contains GitHub Actions
rebuild-open which can be manually run via github.com.
These will run the respective phylogenetic build pipelines starting from the preprocessed (filtered) files.
This will ask for an optional “trial name” and upload intermediate files to
nextstrain-staging/files/ncov/open/trial/$TRIAL_NAME; if you don’t supply this you will overwrite the files at
nextstrain-data/files/ncov/open, as well as the trees at
The GitHub action will follow along with the AWS job so that you can monitor the progress; as of October 2021 each action took around 3 hours.