AlphaFold
This is the landing page for AlphaFold which contains examples, tutorials and other contributed resources for using AlphaFold on the UFS HPC.
AlphaFold uses machine learning to predict the three dimensional structure of a protein from a supplied protein sequence in FASTA format.
For more information about the parameters used by AlphaFold, consult the parameter reference and the AlphaFold GitHub page.
Please also note that the AlphaFold does not provide processing scripts for output outside of their notebooks. However, the UFS HPC have written a script that produces the following processed output under the processed_output directory in the result directory:
- Plots of "pLDDT versus Residue Index" for all models
- CSV files containing the data used to construct the plots above (one for each model)
- CSV file which contain the rank order of the models according to global pLDDT scores
Additionally, for the multimer and monomer_ptm model presets:
- Heatmap plots with PAE values for each model
- CSV files containing the data used to construct the PAE heatmap plots
This output processing script can be executed after running the main AlphaFold script.
For more information about the significance of the output produced, consult the AlphaFold GitHub page and the AlphaFold publications.
UFS HPC Usage Example
AlphaFold runs on the UFS HPC within an interactive PBS job.
Follow these steps to submit and AlphaFold job:
-
Create a screen session:
$ screen -S *use an appropriate name here*
-
Load the hpc_scripts module:
$ module load hpc_scripts/latest
-
Start an interactive PBS job using qwiz:
Please refer to the Performance Notes below to guide your decision for the amount of resources needed -
Create a directory and copy your input files to this directory.
-
Unload the current CUDA library:
$ module unload cuda
-
Load a newer version of the CUDA library (11.2 works well):
$ module load cuda/11.2
-
Prepare the GPU for execution by executing this line in the terminal (copy and paste):
[ -e $PBS_GPUFILE ] && export CUDA_VISIBLE_DEVICES=$(cat $PBS_GPUFILE|sed "s|.*gpu||g"|tr '\n' ','|sed "s|,$||g")
-
Now load the AlphaFold module
$ module load life-sciences/alphafold/2.1.2
-
Activate the conda environment for AlphaFold to run in:
$ alphafold_init
-
Now you may run AlphaFold by using the provided run_alphafold.sh script. An example is shown below:
$ run_alphafold.sh -o results_test -f zikv_EDIII.fasta -t 2022-03-15 -m monomer -c full_dbs -a $CUDA_VISIBLE_DEVICES -d $AF2_DB
Please refer to the parameter reference and the AlphaFold GitHub page. Also note that the -a and -d flags should be used as above unless there is a specific reason not to.
-
The AlphaFold run may take some time to complete. You may close the terminal and simply reattach the session when you need to check up on the job. Refer to the screen documentation on how to reattach a screen session.
-
When AlphaFold have finished executing, you may execute the processing script provided by the UFS HPC to produce relevant graphs and data from the output:
$ process_alphafold_output.py run --af2_results results_test --nr_models 5 --afmodel monomer
Note that the processing script does not require GPU resources
-
Remember to quit the interactive PBS job in order to free up the resources used for other users on the UFS HPC:
$ exit
-
Finally, delete the screen session. (Again, refer to the screen documentation).
Performance Notes
Recommended resources per job
- 1 GPU
- 16 - 32 CPU cores
- 1 Node
-
16 GB Memory
-
Note that AlphaFold only runs on 1 GPU. The only situation where more than 1 GPU is useful, is in the case of very large sequences (~1000 residues), where the additional GPUs will be used for their memory alone. Thus, if you experience errors relating to memory issues, try assigning an additional GPU until your problem is solved.
Benchmarks
UFS HPC Community Guides and Tutorials
- No community guides are available.
Official site and documentation
Licensing Information
-
The AlphaFold code is licensed under the Apache License, Version 2.0
-
The AlphaFold parameters are licensed under the Creative Commons Attribution 4.0 International license (CC BY 4.0)
-
The mirrored databases have the following licenses:
- BFD (unmodified), BFD (modified by DeepMind), Uniclust30: v2018_08 - Creative Commons Attribution-ShareAlike 4.0 International License
- MGnify:v2018_12(unmodified) - CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Primary citation
If the multimer model is used, additionally cite:
Please remember to cite any additional methods used. Check the log file after each run for these additional citations.
External Guides and Resources
- No external guides are available.
- If you know of a guide/tutorial that you have found useful, please help us share it by contacting the HPC staff at hpc@ufs.ac.za