Kaiju

Kaiju is a program for sensitive taxonomic classification of high-throughput sequencing reads from metagenomic whole genome sequencing or metatranscriptomics experiments.

Kaiju is made available on the UFS HPC via a conda environment which can by loaded as a module.

UFS HPC Usage Example

Follow these steps to use Kaiju:

  1. Start a qwiz or qvnc session. If using a qvnc session, open the Terminal.

  2. Load the Kaiju module (current version is 1.9.0):

    $ module load life-sciences/kaiju

  3. Initialize the Kaiju environment:

    $ kaiju_init

  4. Kaiju can now be executed normally with the databases accessible via the environmental variable $KAIJU_DB.

  5. To show the databases available for use (and when they were generated), issue the following command:

    $ kaiju_dbs_avail

  6. To use a database, always use the environmental variable $KAIJU_DB as prefix (and tab-complete the directory and file names) when specifying the database paths. As example, the command below uses the viruses database:

    $ kaiju -z 16 -t $KAIJU_DB/viruses_2022-03-29/nodes.dmp 
            -f $KAIJU_DB/viruses_2022-03-29/kaiju_db_viruses.fmi 
            -i kaiju-testdata/sars-cov-2_1.fastq.gz -o sars-cov-2_1.out

Performance Notes

The -z flag should be set to the number of cores requested for the job.

  • 1 Node
  • These types of applications are usually memory bound. For Kaiju the database used will determine the amount of memory needed. The following table is reproduced from the manual:
Database Description Approximate Number of Sequences Minimum Memory Required (GB)
refseq Completely assembled and annotated reference genomes of Archaea, Bacteria, and viruses from the NCBI RefSeq database. 98 M 67
progenomes Representative set of genomes from the proGenomes database and viruses from the NCBI RefSeq database. 41 M 30
viruses Only viruses from the NCBI RefSeq database. 0.6 M 0.4
plasmids Plasmid sequences from the NCBI RefSeq database. 3.7 M 2.2
fungi Fungi sequences from the NCBI RefSeq database. 4.4 M 4.2
nr Subset of NCBI BLAST nr database containing all proteins belonging to Archaea, Bacteria and Viruses. 249 M 148
nr_euk Like option -s nr and additionally include proteins from fungi and microbial eukaryotes 277 M 168
rvdb Protein sequences from RVDB-prot 10.5 M 17

Benchmarks

No benchmarks are available.

UFS HPC Community Guides and Tutorials

  • No community guides are available.

Official site and documentation

Licensing Information

Primary citation

Menzel, P., Ng, K. & Krogh, A. Fast and sensitive taxonomic classification for metagenomics with Kaiju. Nat Commun 7, 11257 (2016)

Please remember to cite any additional methods used.

External Guides and Resources