Trinotate
This is the landing page for Trinotate which contains examples, tutorials and other contributed resources for using Trinotate on the UFS HPC.
Trinotate is an annotation pipeline that can be used in the functional annotation of transcriptomes (particularly de novo assembled ones) from model or non-model organisms.
Trinotate is made available via a conda environment that can be loaded as a module.
Please note the following:
-
Trinotate integrates with Trinity, which is already available on the UFS, however it isn't included in the environment. Trinotate achieves this integration by simply loading the output files produced by Trinity.
-
Other tools such as Transdecoder, SignalP, TMHMM and RNAMMER (which uses HMMER version 2.x) are included and can be directly executed within the environment.
-
HMMER version 3 is also availible in the environment and is the default version that will run with the default executable names (ex. hmmsearch). To run the version 2.x executables of HMMER, simply suffix the command with 2 (ex. hmmsearch2)
-
It is recommended to run the script autoTrinotate.pl wherever possible - This will be shown in the usage example below. However it is possible to run each step of the pipeline individually - just note the point below.
-
Trinotate was installed via the bioconda package, which, in contrast to the officially distributed package, have placed all Trinotate scripts in the bin directory of the environment. The consequence of this is that all scripts can be called directly (which is convenient). However, some scripts that call other scripts, still have the hard-coded paths those scripts. The fix is straight-forward, and have been implemented in all the components called by autoTrinotate.pl, but these issues may still be present in other utility scripts. Please let the UFS HPC team know when such a script is encountered and we will apply fixes to those scripts
UFS HPC Usage Example
Trinotate is available on the UFS HPC in the form of a conda enviroment that is loaded via environmental modules. It is assumed that the most common usage of the pipeline will be within an interactive qwiz or qvnc session, and thus the usage example is illustrated for these scenarios:
Follow these steps to use the pipeline:
-
Start a qwiz or qvnc session. If using a qvnc session, open the Terminal.
-
Load the Trinotate module (current version is 3.2.2):
$ module load life-sciences/trinotate/3.2.2
-
Initialize the Trinotate environment:
$ trinotate_init
-
Next, create a directory in your home directory where you will be performing the analysis in and cd to this directory.
-
Now copy (or link) the fasta file and gene-to-transcript file obtained as output from Trinity to your analysis directory. As examples, these will be expr_1_trinity_assembly.fasta and expr1_trinity_assembly.gene_to_trans_map respectively.
-
Next, obtain a boilerplate Trinotate database. In the official documentation you need to generate this through various steps, however, your friendly neighbourhood UFS HPC staff member have pre-generated this for you, so simply issue the following command in your analysis directory to get the file:
$ get_trinotate_db
-
The file rename-this-file.sqlite will be present in your analysis directory after following step 6. Rename this file appropriately, for example: expr_1.sqlite
-
Obtain the configuration file (ufs_trinotate_conf.txt) for autoTrinotate.pl by issuing the following command in your analysis directory:
$ get_trinotate_conf
-
Now we can execute the autoTrinotate.pl script as follows:
$ autoTrinotate.pl --Trinotate_sqlite exp_1.sqlite --transcripts expr_1_trinity_assembly.fasta --gene_to_trans_map expr1_trinity_assembly.gene_to_trans_map --conf ufs_trinotate_conf.txt --CPU <number of cores you reserved via PBS>
Note: If your run is interrupted, you can simply run the script again and the analysis will continue from the step where the interruption occurred.
-
To explore the Trinotate results, TrinotateWeb can be used. If this is the first time using TrinotateWeb, install the web server into your home directory (You only need to install it once):
$ install_trinotate_web
This should create a trinotate_web directory in your home directory.
Note: TrinotateWeb can only be used within a qvnc session -
Now start the TrinotateWeb server:
$ run_TrinotateWebserver.pl 8080
Note: You need to keep this terminal window active while using TrinotateWeb
-
Now open a web browser (Chromium is recommended), and navigate to the following address:
http://localhost:8080/cgi-bin/index.cgi
-
You should be taken to a page requesting the path to the Trinotate sqlite database generated by the annotation pipeline. Note that the full path needs to be entered directly.
Hint: Navigate to the analysis directory in a separate terminal and issue the pwd command, copy the path (select and press Shift-Ctrl-c and paste it into the box, and append the filename to it.)
Click on the Submit button -
The database will be loaded (this may take some time depending on its size) and if successful should load the Overview page as shown below:
Performance Notes
Recommended resources per session
- 16 - 32 CPU cores
- 1 Node
- 32 GB Memory
Benchmarks
No benchmarks are available.
UFS HPC Community Guides and Tutorials
- No community guides are available.
Official site and documentation
Licensing Information
- Trinotate is free to use and distribute under the terms of the Broad Institute license
Primary citation
Pipeline component citations
HMMER
PFAM
SignalP
TMHMM
BLAST
KEGG
GO
eggNOG
RNAMMER
Please remember to cite any additional methods used.
External Guides and Resources
- De novo RNA-Seq Assembly, Annotation, and Analysis Using Trinity and Trinotate
- Running Trinotate for annotating the transcripts
- If you know of a guide/tutorial that you have found useful, please help us share it by contacting the HPC staff at hpc@ufs.ac.za