Kaiju
Kaiju
is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.
Kaiju
is made available on the UFS HPC via a conda environment which can by loaded as a module.
UFS HPC Usage Example
Follow these steps to use Kaiju
:
-
Start a qwiz or qvnc session. If using a qvnc session, open the Terminal.
-
Load the
Kaiju
module (current version is 1.9.0):$ module load life-sciences/kaiju
-
Initialize the
Kaiju
environment:$ kaiju_init
-
Kaiju
can now be executed normally with the databases accessible via the environmental variable$KAIJU_DB
. -
To show the databases available for use (and when they were generated), issue the following command:
$ kaiju_dbs_avail
-
To use a database, always use the environmental variable
$KAIJU_DB
as prefix (and tab-complete the directory and file names) when specifying the database paths. As example, the command below uses theviruses
database:$ kaiju -z 16 -t $KAIJU_DB/viruses_2022-03-29/nodes.dmp -f $KAIJU_DB/viruses_2022-03-29/kaiju_db_viruses.fmi -i kaiju-testdata/sars-cov-2_1.fastq.gz -o sars-cov-2_1.out
Performance Notes
The -z
flag should be set to the number of cores requested for the job.
Recommended resources per session
- 1 Node
- These types of applications are usually memory bound. For
Kaiju
the database used will determine the amount of memory needed. The following table is reproduced from the manual:
Database | Description | Approximate Number of Sequences | Minimum Memory Required (GB) |
---|---|---|---|
refseq |
Completely assembled and annotated reference genomes of Archaea, Bacteria, and viruses from the NCBI RefSeq database. | 98 M | 67 |
progenomes |
Representative set of genomes from the proGenomes database and viruses from the NCBI RefSeq database. | 41 M | 30 |
viruses |
Only viruses from the NCBI RefSeq database. | 0.6 M | 0.4 |
plasmids |
Plasmid sequences from the NCBI RefSeq database. | 3.7 M | 2.2 |
fungi |
Fungi sequences from the NCBI RefSeq database. | 4.4 M | 4.2 |
nr |
Subset of NCBI BLAST nr database containing all proteins belonging to Archaea, Bacteria and Viruses. | 249 M | 148 |
nr_euk |
Like option -s nr and additionally include proteins from fungi and microbial eukaryotes | 277 M | 168 |
rvdb |
Protein sequences from RVDB-prot | 10.5 M | 17 |
Benchmarks
No benchmarks are available.
UFS HPC Community Guides and Tutorials
- No community guides are available.
Official site and documentation
Licensing Information
Kaiju
is free to use and distribute under the terms of the GNU General Public License v3.0
Primary citation
Please remember to cite any additional methods used.
External Guides and Resources
- Viral Metagenome from a dolphin sample: hunting for a disease causing virus
- If you know of a guide/tutorial that you have found useful, please help us share it by contacting the HPC staff at hpc@ufs.ac.za