Setup and installation
The following steps will prepare you to run complete analyses of SARS-CoV-2 data by installing required software and running a simple example workflow.
1. Setup your Nextstrain environment with Conda
The following instructions use Conda to install the tools you’ll need for this tutorial. Conda is a package and environment management system that allows you to install Python and other software into controlled environments without disrupting other software you have installed (e.g., on your computer, your shared cluster, etc.).
If you use Microsoft Windows, install the Windows Subsystem for Linux (WSL). Follow instructions to open a new WSL window for your Linux distribution and then run the following commands.
Create the Nextstrain environment
Install Miniconda with Python 3 for your operating system, update Conda to the latest version, and install Mamba for faster installation.
conda update -n base conda
conda install -n base -c conda-forge mamba
Create a Conda environment named nextstrain
.
This command will install Git, Snakemake, and Nextstrain’s full toolkit, the tools you’ll need to work through this tutorial.
mamba create -n nextstrain -c conda-forge -c bioconda \
augur auspice nextstrain-cli nextalign snakemake awscli git pip
Confirm that the installation worked.
conda activate nextstrain
nextstrain check-setup --set-default
The final output from the last command should look like this:
Setting default environment to native.
If the final output from the last command is Setting default environment to docker.
, close Docker and run the last command again.
The Nextstrain CLI will prefer the Docker environment when it is available, but this tutorial assumes you are running commands in the native environment.
Update the Nextstrain environment
Update your environment regularly to the latest versions of these tools.
# Update Conda and Mamba.
mamba update -n base conda mamba
# Update tools in the Nextstrain environment.
conda activate nextstrain
mamba update --all -c conda-forge -c bioconda
2. Download the ncov workflow
Download the workflow
Use Git to download a copy of the ncov repository containing the workflow and this tutorial.
git clone https://github.com/nextstrain/ncov.git
cd ncov
Alternately, download a compressed copy of the ncov repository called ncov-master.zip
.
Open this file to decompress it and create a directory called ncov-master/
with the contents of the workflow in it.
Navigate to this directory from the command line.
Update the workflow
We update the official workflow regularly with:
curated metadata including latitudes/longitudes, clade annotations, and low quality sequences
bug fixes
Update your local copy of the workflow, to benefit from these changes.
# Download and apply changes from the Nextstrain team.
# This only works if there is no conflict with your local repository.
git pull --ff-only origin master
# OR:
# Alternately, download and apply changes from the Nextstrain team
# and then replay your local changes on top of those incoming changes.
git pull --rebase origin master
Alternately, download a specific version of the workflow that you know works for you. We create new releases of the workflow any time we introduce breaking changes, so you can choose when to update based on what has changed.
# Download version 7 (v7) of the workflow.
curl -OL https://github.com/nextstrain/ncov/archive/refs/tags/v7.zip
# Uncompress the workflow.
unzip v7.zip
# Change into the workflow's directory.
cd ncov-7/
3. Run a basic analysis with example data
Run a basic workflow with example data, to confirm that your Nextstrain environment is properly configured.
nextstrain build . --cores 4 --use-conda \
--configfile ./my_profiles/getting_started/builds.yaml
The nextstrain build
command runs a pathogen analysis or “build” defined by a Snakemake workflow in a specific directory.
Since our Snakefile
is in the current directory, we specify the build directory as .
.
All other arguments pass through to Snakemake.
The --use-conda
flag instructs Snakemake to maintain a workflow-specific Conda environment defined in workflow/envs/nextstrain.yaml
.
Snakemake will activate this environment before it runs each rule, ensuring each rule runs with the necessary software.
The first time you use this flag, Snakemake will download and install all Nextstrain tools required to run the workflow.
This can take a little time.
The next time you use this flag, Snakemake will detect the existing environment and use it immediately.
The getting_started
build produces a minimal global phylogeny for visualization in Auspice.
This workflow should complete in about 10 minutes on a MacBook Pro (2.7 GHz Intel Core i5) with four cores, including 5 minutes to create the Conda environment and 5 minutes to run the workflow.
4. Visualize the phylogeny for example data
Open http://auspice.us in your browser.
Drag and drop the JSON file auspice/ncov_global.json
anywhere on the http://auspice.us landing page, to visualize the resulting phylogeny.
The resulting phylogeny should look something like this.