AlphaFold

This is the landing page for AlphaFold which contains examples, tutorials and other contributed resources for using AlphaFold on the UFS HPC.

AlphaFold uses machine learning to predict the three dimensional structure of a protein from a supplied protein sequence in FASTA format.

For more information about the parameters used by AlphaFold, consult the parameter reference and the AlphaFold GitHub page.

Please also note that the AlphaFold does not provide processing scripts for output outside of their notebooks. However, the UFS HPC have written a script that produces the following processed output under the processed_output directory in the result directory:

  • Plots of "pLDDT versus Residue Index" for all models
  • CSV files containing the data used to construct the plots above (one for each model)
  • CSV file which contain the rank order of the models according to global pLDDT scores

Additionally, for the multimer and monomer_ptm model presets:

  • Heatmap plots with PAE values for each model
  • CSV files containing the data used to construct the PAE heatmap plots

This output processing script can be executed after running the main AlphaFold script.

For more information about the significance of the output produced, consult the AlphaFold GitHub page and the AlphaFold publications.

UFS HPC Usage Example

AlphaFold runs on the UFS HPC within an interactive PBS job.

Follow these steps to submit and AlphaFold job:

  1. Create a screen session:

    $ screen -S *use an appropriate name here*
    
  2. Load the hpc_scripts module:

    $ module load hpc_scripts/latest
    
  3. Start an interactive PBS job using qwiz:

    alphafold_1

    Please refer to the Performance Notes below to guide your decision for the amount of resources needed

  4. Create a directory and copy your input files to this directory.

  5. Unload the current CUDA library:

    $ module unload cuda
    
  6. Load a newer version of the CUDA library (11.2 works well):

    $ module load cuda/11.2
    
  7. Prepare the GPU for execution by executing this line in the terminal (copy and paste):

    [ -e $PBS_GPUFILE ] && export CUDA_VISIBLE_DEVICES=$(cat $PBS_GPUFILE|sed "s|.*gpu||g"|tr '\n' ','|sed "s|,$||g")
    
  8. Now load the AlphaFold module

    $ module load life-sciences/alphafold/2.1.2
    
  9. Activate the conda environment for AlphaFold to run in:

    $ alphafold_init
    
  10. Now you may run AlphaFold by using the provided run_alphafold.sh script. An example is shown below:

    $ run_alphafold.sh  -o results_test -f zikv_EDIII.fasta -t 2022-03-15 -m monomer -c full_dbs -a $CUDA_VISIBLE_DEVICES -d $AF2_DB
    

    Please refer to the parameter reference and the AlphaFold GitHub page. Also note that the -a and -d flags should be used as above unless there is a specific reason not to.

  11. The AlphaFold run may take some time to complete. You may close the terminal and simply reattach the session when you need to check up on the job. Refer to the screen documentation on how to reattach a screen session.

  12. When AlphaFold have finished executing, you may execute the processing script provided by the UFS HPC to produce relevant graphs and data from the output:

    $ process_alphafold_output.py run --af2_results results_test --nr_models 5 --afmodel monomer
    

    Note that the processing script does not require GPU resources

  13. Remember to quit the interactive PBS job in order to free up the resources used for other users on the UFS HPC:

    $ exit
    
  14. Finally, delete the screen session. (Again, refer to the screen documentation).

Performance Notes

  • 1 GPU
  • 16 - 32 CPU cores
  • 1 Node
  • 16 GB Memory

  • Note that AlphaFold only runs on 1 GPU. The only situation where more than 1 GPU is useful, is in the case of very large sequences (~1000 residues), where the additional GPUs will be used for their memory alone. Thus, if you experience errors relating to memory issues, try assigning an additional GPU until your problem is solved.

Benchmarks

UFS HPC Community Guides and Tutorials

  • No community guides are available.

Official site and documentation

Licensing Information

Primary citation

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021). https://doi.org/10.1038/s41586-021-03819-2

If the multimer model is used, additionally cite:

Jumper, J., O'Neill, M., Pritzel, A. et al. Protein complex prediction with AlphaFold-Multimer. bioRxiv, 2021.10.04.463034 (2021). https://doi.org/10.1101/2021.10.04.463034_

Please remember to cite any additional methods used. Check the log file after each run for these additional citations.

External Guides and Resources

  • No external guides are available.
  • If you know of a guide/tutorial that you have found useful, please help us share it by contacting the HPC staff at hpc@ufs.ac.za