BLAST+ Tutorial - Searching BLAST Databases

Introduction

In the previous section we created a custom nucleic acid sequence BLAST database from a multiple FASTA sequence file. In this section we will use the BLAST+ search tool to query the database with an unknown input sequence.

Because we have a nucleic acid sequence database and a (SPOILER) nucleic acid query sequence, we will use the blastn tool from the BLAST+ suite.

Searching the VFPB BLAST database using blastn

Follow these steps to search the VFPB BLAST database with blastn:

  1. cd into ~/blast_tutorial

  2. Confirm that the query fasta file contains a nucleic acid sequence by using more:

    $ more query.faa
    
  3. Show the program options for the blastn tool:

    $ blastn -help
    
  4. Next, use blastn to search the custom BLAST database, created in the previous section, with the query.faa sequence:

    $ blastn -query query.faa \
              -db vfdb_setb_nt/vfdb_setb_nt
              -out result.xml
              -outfmt 5
    

    Where the options used are defined as follows:

    Option Value Description/Comments
    -query Path to the input FASTA file containing the query sequence The default format for input is a FASTA formatted file
    -db The path to the BLAST Database to search against Remember that the final element of the path should be the shared prefix discussed in the previous section
    -out The output file which will store the results of the BLAST search Match the extension used with the output format you wish to use for the results.
    -outfmt Number that specifies the format to use for the BLAST results The number 5 specifies the XML output format, which is recommended as it is required by many downstream tools. Read the BLAST+ documentation to find out more about the other output formats available.
  5. Use more to inspect the result.xml file. Can you retrieve the following information about the top hit:

    • The description of the matched sequence
    • The accession number of the matched sequence
    • The bit-score and and e-value
    • The sequence identity with the matched sequence

Congratulations! You have successfully completed the tutorial!