Genomics Tutorial - Genome Annotation
Introduction
In this section you will predict genes and assess your assembly using Prokka.
Overview
The part of the work-flow we will work on in this section marked in red below:
Learning outcomes
After completing this section of the tutorial you should be able to:
- Use bioinformatics tools to perform gene prediction
- Use genome-viewing software to graphically explore genome annotations and NGS data overlays
Setup the environment
Follow these steps to set-up the conda environment for this section:
- Open a new terminal and load the workshops/workshops/genomics_workshop_annot module:
$ module load workshops/workshops/genomics_workshop_annot
- Activate the conda environment:
$ gen_annot_init
The Data
Lets look at our directory structure in ~/genomics_tutorial so far:
genomics_tutorial
├── assembly
│ ├── quast
│ │ ├── basic_stats
│ │ └── icarus_viewers
│ ├── spades-150
│ │ ├── corrected
│ │ │ └── configs
│ │ ├── K21
│ │ │ ├── configs
│ │ │ └── simplified_contigs
│ │ ├── K33
│ │ │ ├── configs
│ │ │ └── simplified_contigs
│ │ ├── K55
│ │ │ ├── configs
│ │ │ └── simplified_contigs
│ │ ├── K77
│ │ │ ├── configs
│ │ │ └── path_extend
│ │ ├── misc
│ │ ├── mismatch_corrector
│ │ │ ├── contigs
│ │ │ │ └── configs
│ │ │ └── scaffolds
│ │ │ └── configs
│ │ ├── pipeline_state
│ │ └── tmp
│ └── spades-original
│ ├── corrected
│ │ └── configs
│ ├── K21
│ │ ├── configs
│ │ └── simplified_contigs
│ ├── K33
│ │ ├── configs
│ │ └── simplified_contigs
│ ├── K55
│ │ ├── configs
│ │ └── simplified_contigs
│ ├── K77
│ │ ├── configs
│ │ └── path_extend
│ ├── misc
│ ├── mismatch_corrector
│ │ ├── contigs
│ │ │ └── configs
│ │ └── scaffolds
│ │ └── configs
│ ├── pipeline_state
│ └── tmp
├── data
├── kraken
├── krona
│ └── taxonomy
├── mappings
│ ├── evol1.sorted.dedup_stats
│ │ ├── css
│ │ ├── images_qualimapReport
│ │ └── raw_data_qualimapReport
│ └── ref_genome
├── quality_control
│ ├── data
│ ├── multiqc_data
│ ├── trimmed
│ └── trimmed-fastqc
└── variants
└── plots
67 directories
Annotation with Prokka
We will attempt to annotate our assembled genome using Prokka
New Tool
Prokka
A software tool that rapidly annotates prokaryotic genomes
To perform an annotation on our assembled genome, execute the following command:
#Execute Prokka
$ prokka --kingdom Bacteria --genus Escherichia --species coli --outdir annotation assembly/scaffolds.fasta
Your results will be in the annotation
directory with the prefix PROKKA
.
Interactive viewing
We will use the software Integrative Genomics Viewer (IGV) to view the assembly, the genome annotation, and the variants that you have called, all in one window.
New Tool
Integrative Genomics Viewer (IGV)
An easy-to-use interactive tool for the visual exploration of genomic data
Follow these steps to view the genomic data we have generated thus far:
-
Open IGV by running the
igv
command in the terminal. The will open up a new window. -
Navigate to
Genomes
>Load Genome From File
. Load the genome assembly by selectingassembly/spades-150/scaffolds.fasta
-
Next, to load our variant calling data we first need to extract the
vcf
files we compressed earlier. To do this, do the following:#First change into the directory where the variant data is located $ cd variants #extract vcf file for evol1 $ gzip -dk evol1.freebayes.filtered.vcf.gz #extract vcf file for evol2 $ gzip -dk evol2.freebayes.filtered.vcf.gz
-
Next, load each of the extracted
vcf
files in series by navigating toFile
>Load from File
and selecting each file. -
Finally, load the Prokka annotation by navigating to
File
>Load from File
and selecting thegff
file inside theannotation
directory. -
You can now select different contigs and zoom in and out on the sequence.