Setup and installation
The following steps will prepare you to run complete analyses of SARS-CoV-2 data by installing required software and running a simple example workflow.
1. Make a copy of this tutorial
There are two ways to do this:
- [Recommended] If you’re familiar with git, clone this repository either via the web interface, a GUI such as GitKraken, or the command line:
git clone https://github.com/nextstrain/ncov.git
- [Alternative] If you’re not familiar with git, you can also download a copy of these files via the buttons on the left.
2. Setup your Nextstrain environment
Create a Nextstrain conda environment with augur and auspice installed. If you do not have conda installed already, see our full installation instructions for more details. If you are running Windows, see our documentation about setting up the Windows Subsystem for Linux (WSL).
curl http://data.nextstrain.org/nextstrain.yml --compressed -o nextstrain.yml
conda env create -f nextstrain.yml
conda activate nextstrain
npm install --global auspice
3. Run a basic analysis with example data
Run a basic workflow with example data, to confirm that your Nextstrain environment is properly configured.
First, change into the ncov
repository’s directory.
cd ncov
Then, uncompress the example sequence data we include in the repository.
gzip -d -c data/example_sequences.fasta.gz > data/example_sequences.fasta
Finally, run the basic workflow with these example data.
snakemake --cores 4 --profile ./my_profiles/getting_started
The getting_started
profile produces a minimal global phylogeny for visualization in auspice.
This workflow should complete in about 5 minutes on a MacBook Pro (2.7 GHz Intel Core i5) with four cores.
4. Visualize the phylogeny for example data
Go to http://auspice.us in your browser.
Drag and drop the JSON file auspice/ncov_global.json
anywhere on the http://auspice.us landing page, to visualize the resulting phylogeny.
Advanced reading: considerations for keeping a ‘Location Build’ up-to-date
Note: we’ll walk through what each of the referenced files does shortly
Keeping data updated
If you are aiming to create a public health build for a state, division, or area of interest, you likely want to keep your analysis up-to-date easily. If your run contains contextual subsampling (sequences from outside of your focal area), you should first ensure that you regularly download the latest sequences as input, then re-run the build. This way, you always have a build that reflects the most recent SARS-CoV-2 information.
Keeping your workflow updated
You should also aim to keep this ncov
repository updated.
If you’ve clone the repository from Github, this is done by running git pull
.
This downloads any changes that we have made to the repository to your own computer.
In particular, we add new colors and latitute & longitude information regularly - these should match the new sequences you download, so that you don’t need to add this information yourself.
If you don’t need to share the contents of my_profiles
(the files that parameterize your specific analysis) with anyone, then you can leave this in the ./my_profiles/
folder.
It won’t be changed when you git pull
for the latest information.
However, if you want to share your profile, you’ll need to adopt one of the following solutions.
First, you can ‘fork’ the entire ncov
repository, which means you have your own copy of the repository.
You can then add your profile files to the repository and anyone else can download them as part of your ‘fork’ of the repository.
Note that if you do this, you should ensure you pull
regularly from the original ncov
repository to keep it up-to-date.
Alternatively, you can create a new, separate repository to hold your my_profiles
files, outside of the ncov
repository.
You can then share this repository with others, and it’s straightforward to keep ncov
up to date, as you don’t change it at all.
If doing this, it can be easiest to create a my_profiles
folder and imitate the structure found in the ./my_profiles
folder , but this isn’t required.
Note that to run the build you’ll need still run the snakemake
command from within the ncov
repository, but specify that the build you want is outside that folder.
For the south-usa-sarscov2
example, you can see the south-central
build set up in a profiles
folder.
To run this, one would call the following from within ncov
:
snakemake --cores 1 --profile ../south-usa-sarscov2/profiles/south-central/