Basecalls
This directory contains scripts related to preparing raw .fast5
reads for basecalling.
This involves moving files to consistent locations, normalizing read counts for uniform computational times, and generating summary statistics on all or a subsets of libraries.
Note that this directory does not handle any reads from USVI library 2, as that library was sequenced using miSeq, not minION.
Sbatch job submission
For all operations involving more than 10,000 files (everything other than grab_1000_fast5s.py
), it is recommended to use Slurm to hand the job off to Rhino, rather than running in a terminal window. Our default Slurm command is:
sbatch --time=24:00:00 --mem=10000 --mail-type=END,FAIL --mail-user=<username>@fhcrc.org --wrap="<job call>"
This specifies 24h of compute time, and 10GB of memory for the job. Sometimes, increasing --mem
or --time
is necessary. This also sends an email when the job either fails or ends successfully. The --wrap
command takes the exact command that would normally be run from the terminal, and it inherits whichever modules are loaded at the time that the job is submitted.
Many of the scripts in this directory can be used to hand off commands that move tons of files to Slurm, so that they are not reliant on keeping a terminal window open the whole time that they run.
Scripts
clear_workspace.py
: deletes thebasecalled_reads/workspace
directory for the chosen librarycount_ratios.py
: Looks at all libraries and quantifies the % successful demux by Albacore’s native demuxerdistribute_albacore.py
: For 1d libraries, hands Albacore jobs to slurm for every subdirectory in theraw_reads
foldergrab_1000_fast5s.py
: Moves 1000 fast5 files (either 1d or 2d) into a test directory for small-scale analysisprep_lib.py
: Moves files out of numbered subdirectories ofbasecalled_reads/workspace
, deletes those numbered folders, and creates ademux
folder for downstream stepsremove_tmp.py
: Used for handling library8 reads that did not initially sync to Rhino correctly during minKNOW runsubdivide.py
: un-doesprep-lib.py
; used for library8 after all reads were successfully copied to hand multiple jobs to slurm