Trinotate

This is the landing page for Trinotate which contains examples, tutorials and other contributed resources for using Trinotate on the UFS HPC.

Trinotate is an annotation pipeline that can be used in the functional annotation of transcriptomes (particularly de novo assembled ones) from model or non-model organisms.

Trinotate is made available via a conda environment that can be loaded as a module.

Please note the following:

  • Trinotate integrates with Trinity, which is already available on the UFS, however it isn't included in the environment. Trinotate achieves this integration by simply loading the output files produced by Trinity.

  • Other tools such as Transdecoder, SignalP, TMHMM and RNAMMER (which uses HMMER version 2.x) are included and can be directly executed within the environment.

  • HMMER version 3 is also availible in the environment and is the default version that will run with the default executable names (ex. hmmsearch). To run the version 2.x executables of HMMER, simply suffix the command with 2 (ex. hmmsearch2)

  • It is recommended to run the script autoTrinotate.pl wherever possible - This will be shown in the usage example below. However it is possible to run each step of the pipeline individually - just note the point below.

  • Trinotate was installed via the bioconda package, which, in contrast to the officially distributed package, have placed all Trinotate scripts in the bin directory of the environment. The consequence of this is that all scripts can be called directly (which is convenient). However, some scripts that call other scripts, still have the hard-coded paths those scripts. The fix is straight-forward, and have been implemented in all the components called by autoTrinotate.pl, but these issues may still be present in other utility scripts. Please let the UFS HPC team know when such a script is encountered and we will apply fixes to those scripts

UFS HPC Usage Example

Trinotate is available on the UFS HPC in the form of a conda enviroment that is loaded via environmental modules. It is assumed that the most common usage of the pipeline will be within an interactive qwiz or qvnc session, and thus the usage example is illustrated for these scenarios:

Follow these steps to use the pipeline:

  1. Start a qwiz or qvnc session. If using a qvnc session, open the Terminal.

  2. Load the Trinotate module (current version is 3.2.2):

    $ module load life-sciences/trinotate/3.2.2
    
  3. Initialize the Trinotate environment:

    $ trinotate_init
    
  4. Next, create a directory in your home directory where you will be performing the analysis in and cd to this directory.

  5. Now copy (or link) the fasta file and gene-to-transcript file obtained as output from Trinity to your analysis directory. As examples, these will be expr_1_trinity_assembly.fasta and expr1_trinity_assembly.gene_to_trans_map respectively.

  6. Next, obtain a boilerplate Trinotate database. In the official documentation you need to generate this through various steps, however, your friendly neighbourhood UFS HPC staff member have pre-generated this for you, so simply issue the following command in your analysis directory to get the file:

    $ get_trinotate_db
    
  7. The file rename-this-file.sqlite will be present in your analysis directory after following step 6. Rename this file appropriately, for example: expr_1.sqlite

  8. Obtain the configuration file (ufs_trinotate_conf.txt) for autoTrinotate.pl by issuing the following command in your analysis directory:

    $ get_trinotate_conf
    
  9. Now we can execute the autoTrinotate.pl script as follows:

    $ autoTrinotate.pl  --Trinotate_sqlite exp_1.sqlite 
                        --transcripts expr_1_trinity_assembly.fasta 
                        --gene_to_trans_map expr1_trinity_assembly.gene_to_trans_map 
                        --conf ufs_trinotate_conf.txt 
                        --CPU <number of cores you reserved via PBS>
    

    Note: If your run is interrupted, you can simply run the script again and the analysis will continue from the step where the interruption occurred.

  10. To explore the Trinotate results, TrinotateWeb can be used. If this is the first time using TrinotateWeb, install the web server into your home directory (You only need to install it once):

    $ install_trinotate_web
    

    This should create a trinotate_web directory in your home directory.
    Note: TrinotateWeb can only be used within a qvnc session

  11. Now start the TrinotateWeb server:

    $ run_TrinotateWebserver.pl 8080
    

    Note: You need to keep this terminal window active while using TrinotateWeb

  12. Now open a web browser (Chromium is recommended), and navigate to the following address:

    http://localhost:8080/cgi-bin/index.cgi
    
  13. You should be taken to a page requesting the path to the Trinotate sqlite database generated by the annotation pipeline. Note that the full path needs to be entered directly.
    Hint: Navigate to the analysis directory in a separate terminal and issue the pwd command, copy the path (select and press Shift-Ctrl-c and paste it into the box, and append the filename to it.)

    trinotateweb_1

    Click on the Submit button

  14. The database will be loaded (this may take some time depending on its size) and if successful should load the Overview page as shown below:

    trinotateweb_2

Performance Notes

  • 16 - 32 CPU cores
  • 1 Node
  • 32 GB Memory

Benchmarks

No benchmarks are available.

UFS HPC Community Guides and Tutorials

  • No community guides are available.

Official site and documentation

Licensing Information

Primary citation

Bryant, D.M., Johnson, K., DiTommaso, T., Tickle, T., Couger, M.B., Payzin-Dogru, D., Lee, T.J., Leigh, N.D., Kuo, T.H., Davis, F.G. and Bateman, J., 2017. A tissue-mapped axolotl de novo transcriptome enables identification of limb regeneration factors. Cell reports, 18(3), pp.762-776.

Pipeline component citations

HMMER

Eddy SR (2008) A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation. PLoS Comput Biol 4(5): e1000069. https://doi.org/10.1371/journal.pcbi.1000069

PFAM

Jaina Mistry, Sara Chuguransky, Lowri Williams, Matloob Qureshi, Gustavo A Salazar, Erik L L Sonnhammer, Silvio C E Tosatto, Lisanna Paladin, Shriya Raj, Lorna J Richardson, Robert D Finn, Alex Bateman, Pfam: The protein families database in 2021, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D412–D419, https://doi.org/10.1093/nar/gkaa913

SignalP

Nielsen H. (2017) Predicting Secretory Proteins with SignalP. In: Kihara D. (eds) Protein Function Prediction. Methods in Molecular Biology, vol 1611. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7015-5_6

TMHMM

Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J Mol Biol. 2001 Jan 19;305(3):567-80.

BLAST

Altschul SF; Gish W; Miller W; Myers EW; Lipman DJ. Basic local alignment search tool. J Mol Biol 215: 403-10 (1990)

KEGG

Minoru Kanehisa, Susumu Goto, Yoko Sato, Miho Furumichi, Mao Tanabe, KEGG for integration and interpretation of large-scale molecular data sets, Nucleic Acids Research, Volume 40, Issue D1, 1 January 2012, Pages D109–D114, https://doi.org/10.1093/nar/gkr988

GO

Ashburner, M., Ball, C., Blake, J. et al. Gene Ontology: tool for the unification of biology. Nat Genet 25, 25–29 (2000). https://doi.org/10.1038/75556

eggNOG

Sean Powell, Damian Szklarczyk, Kalliopi Trachana, Alexander Roth, Michael Kuhn, Jean Muller, Roland Arnold, Thomas Rattei, Ivica Letunic, Tobias Doerks, Lars J. Jensen, Christian von Mering, Peer Bork, eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges, Nucleic Acids Research, Volume 40, Issue D1, 1 January 2012, Pages D284–D289, https://doi.org/10.1093/nar/gkr1060

RNAMMER

Karin Lagesen, Peter Hallin, Einar Andreas Rødland, Hans-Henrik Stærfeldt, Torbjørn Rognes, David W. Ussery, RNAmmer: consistent and rapid annotation of ribosomal RNA genes, Nucleic Acids Research, Volume 35, Issue 9, 1 May 2007, Pages 3100–3108, https://doi.org/10.1093/nar/gkm160

Please remember to cite any additional methods used.

External Guides and Resources